← millfolio privacy box ↗
deep dive

A question becomes a program

millfolio lets a powerful remote model help with your private data without ever seeing it. The trick — the privacy box harness — is to treat the frontier model as an untrusted code generator, not a data processor. Here's how one question turns into code that runs on your Mac.

Two asymmetric models

There are two models, deliberately asymmetric:

Frontier model — cloud untrusted · sees only aliases (file_0, col_2) Your Mac — your data never leaves it browser / iOS the millfolio app App server · :10000 UI · REST · chat WS launchd Generated Mojo program runs in a sandbox — loopback network only Inference server · :8000 chat + embeddings · on-device GPU launchd Your vault PDF · Word · CSV · notes + on-device LanceDB index HTTPS/WS writes the program (aliases only) orchestrate · approval gate loopback HTTP ask_local · search reads files
One constraint shapes everything: the frontier-written program runs as a separate, sandboxed process that can reach the network only over loopback — so inference is a local service it talks to over HTTP, not an in-process call. That's why there are two servers (and why they stay two: the app restarts without reloading the ~7 GB model). Both run under launchd.

The contract the frontier model is given is a single document — the privacy-box system prompt — loaded at runtime. It spells out the confidentiality rules, the current Mojo dialect, and the tool surface (search, csv_rows, pdf_text, docx_text, ask_local, print_answer, …).

Example: a question becomes a program

You ask "How much did I spend on travel last year?". The frontier model — seeing only aliases — writes this, and the privacy box compiles it and runs it in a sandbox that can reach only your local model:

# written by the frontier model — it never sees a single real value
from vault import *
def main() raises:
    var hits = search("travel transportation flights hotels expenses", 40)
    var total = 0.0
    for c in hits:
        # ask_local reads the REAL chunk on-device; returns "amount|yes" or "0|no"
        var verdict = ask_local(
            "If this is a 2025 travel expense, reply '<amount>|yes', else '0|no'.", c.text)
        var parts = verdict.split("|")
        if len(parts) == 2 and String(parts[1]) == "yes":
            total += atof(String(parts[0]))
    print_answer("You spent about $" + String(total) + " on travel in 2025.")

The frontier model orchestrates over aliases; search and ask_local do the real work locally; the sum is computed on your machine and print_answer surfaces it there. The search results and the answer are never returned to the frontier model — which is exactly why the program model is load-bearing, not an implementation detail.

Why it holds

Containment lives outside the model, at the OS level. The generated program runs under a Seatbelt profile that denies all network except loopback to your local engine — it can't phone home. An egress guard gates every message to the frontier (fails closed), and the compile-feedback loop only ever sends back aliased source, never runtime output that might contain real content. Your documents never leave the Mac, and never reach the frontier model. See the walkthrough to try it, or privacy box for the full design.