millfolio deep dive

Two asymmetric models

There are two models, deliberately asymmetric:

The frontier model (untrusted) is the planner/coder. It sees only a sanitized manifest of your vault — file aliases (file_0), kinds, aliased column schemas (col_2) — never contents, names, or values. From that it writes one Mojo program that calls a fixed set of vault tools.
The local model (trusted, on your device) is the reader. When the program needs to understand content — "is this a travel expense?", "read the renewal date" — it calls ask_local(...), which runs on the millfolio inference engine and is the only thing that ever sees real text.

One constraint shapes everything: the frontier-written program runs as a separate, sandboxed process that can reach the network only over loopback — so inference is a local service it talks to over HTTP, not an in-process call. That's why there are two servers (and why they stay two: the app restarts without reloading the ~7 GB model). Both run under launchd.

The contract the frontier model is given is a single document — the privacy-box system prompt — loaded at runtime. It spells out the confidentiality rules, the current Mojo dialect, and the tool surface (search, csv_rows, pdf_text, docx_text, ask_local, print_answer, …).

Example: a question becomes a program

You ask "How much did I spend on travel last year?". The frontier model — seeing only aliases — writes this, and the privacy box compiles it and runs it in a sandbox that can reach only your local model:

# written by the frontier model — it never sees a single real value
from vault import *
def main() raises:
    var hits = search("travel transportation flights hotels expenses", 40)
    var total = 0.0
    for c in hits:
        # ask_local reads the REAL chunk on-device; returns "amount|yes" or "0|no"
        var verdict = ask_local(
            "If this is a 2025 travel expense, reply '<amount>|yes', else '0|no'.", c.text)
        var parts = verdict.split("|")
        if len(parts) == 2 and String(parts[1]) == "yes":
            total += atof(String(parts[0]))
    print_answer("You spent about $" + String(total) + " on travel in 2025.")

The frontier model orchestrates over aliases; search and ask_local do the real work locally; the sum is computed on your machine and print_answer surfaces it there. The search results and the answer are never returned to the frontier model — which is exactly why the program model is load-bearing, not an implementation detail.

Why it holds

Containment lives outside the model, at the OS level. The generated program runs under a Seatbelt profile that denies all network except loopback to your local engine — it can't phone home. An egress guard gates every message to the frontier (fails closed), and the compile-feedback loop only ever sends back aliased source, never runtime output that might contain real content. Your documents never leave the Mac, and never reach the frontier model. See the walkthrough to try it, or privacy box for the full design.

A question becomes a program

Two asymmetric models

Example: a question becomes a program

Why it holds