If you’ve watched Claude Code call a tool — read a file, hit an MCP server, run a command — you already understand the shape of the new Siri. Apple introduced “Siri AI” at WWDC 2026 alongside macOS 27 “Golden Gate,” and underneath the marketing it is a fairly conventional tool-calling agent: a language model that plans, picks tools from a catalog, fills in their arguments, asks you to confirm the risky ones, and runs them. The interesting part is that Apple built that loop entirely in-house, chose App Intents as its one tool contract, and made some deliberately different engineering choices than the MCP world did.

Everything below comes from first-hand forensic inspection of one Mac running the Golden Gate beta, build 26A5353q, then cross-checked against Apple’s WWDC sessions and Newsroom posts. I separate two kinds of statement throughout. A fact means I reproduced an artifact — a binary, a vmmap mapping, a SQLite store, an entitlement, a metadata file. An inference means I reasoned from names, linkage, or correlation but never captured the live edge. The seams between processes are where most of the inference lives, because tracing XPC on the wire needs root I didn’t have. I flag each one.

The mental model, mapped

Here is the whole thing as a translation table from the Claude-Code/MCP world you may already know:

Claude Code / MCPThe new Siri on macOS
The model (Claude) that plans and calls toolsAn on-device model — AFM 3 — with a dedicated lw_planner_v1 planner adapter
An MCP server you addAn installed app that declares App Intents
A tool definition (name + JSON schema)An App Intent (e.g. CreateReminderAppIntent) with typed parameters, optionally tagged with a standard App Schema
tools/list — the catalog the model seesThe ToolKit catalog (Tools-prod SQLite), aggregated across every installed app
Resources / lookups the tool referencesApp Entities + Entity Queries (ReminderEntity, SectionEntityQuery)
The tool call the model emitsThe planner emits a call by pythonName, executed by siriappintentsd / siriactionsd
The approval prompt before a writeauthenticationPolicy + built-in confirm / askUser primitives
The MCP client/host (Claude Code itself)assistantd (front-door broker) + intelligenceflowd (the agent loop)
The transport (stdio / HTTP / JSON-RPC)XPC against a static on-device catalog — no wire protocol, no remote servers
Putting every tool in the promptRAG: retrieve the relevant tools from a vector database, per request

Two things to flag immediately. First, there is no MCP anywhere in Siri’s path — Apple built an equivalent with its own proprietary plumbing. (The MCP you may have seen in Xcode 27 is a developer-tools bridge for coding agents; unrelated.) Second, the model itself is genuinely new and entirely internal — but the contract it calls through is the same App Intents API third-party developers already use.

A worked example: “add milk to the Errands section”

The cleanest way to see the tool contract is to read one. On Golden Gate, Reminders ships a dedicated RemindersAppIntents.framework whose Metadata.appintents declares 39 App Intents, 13 entities, and 10 entity queries. Six of those intents are tagged with Apple’s official reminders App Schema domain — the ones the planner is trained to recognise deeply: CreateReminderIntent, UpdateReminderIntent, DeleteRemindersIntent, CreateListIntent, CreateSectionIntent. (This is fact — read from the on-disk metadata.)

TTRCreateReminderAppIntent is, in MCP terms, a tool definition. Stripped to its shape:

tool  reminders/CreateReminderIntent          # App Schema domain "reminders", v1.0.0
  from   RemindersAppIntents.framework         # the "server" = the app
  desc   "Creates a new reminder."
  auth   none                                  # authenticationPolicy 0 — no unlock to create
  params
    title            String
    dueDate          Date
    list             ListEntity                # resolved by ListEntityQuery
    section          SectionEntity             # resolved by SectionEntityQuery
    parentReminder   ReminderEntity            # i.e. make it a subtask
    locationTrigger  LocationTriggerEntity
    priorityLevel    PriorityLevel  (enum)
    isFlagged, isAllDay, notes, tags[], urls[],
    subtasks[], recurrence, assignedPerson, contactPerson, ...

That is a JSON-schema-shaped tool, no different in spirit from one an MCP server would advertise — typed scalars, enums, arrays, and entity-typed parameters that point at things the model has to resolve first. The entity queries (SectionEntityQuery, ListEntityQuery) are the resolvers: “the Errands section” becomes a concrete SectionEntity the call can take.

So when you say “add milk to the Errands section of my Shopping list when I get home,” the loop is recognisably an agent doing function-calling:

A single request — 'add milk to the Errands section when I get home' — as an agent tool

Figure 1 — A single request — ‘add milk to the Errands section when I get home’ — as an agent tool call. Dashed steps are seams inferred from entitlements rather than captured on the wire.

  1. A Siri UI surface hands the utterance to assistantd, which brokers the session.
  2. assistant_cdmd parses the natural language into a structured intent (its CDM pipeline links the Espresso neural runtime).
  3. intelligenceflowd plans: it retrieves the reminders/CreateReminderIntent tool, resolves ShoppingListEntity and ErrandsSectionEntity via their queries, and binds locationTrigger to home.
  4. It fills title: "milk", list, section, locationTrigger, and — because authenticationPolicy here is none — runs it without a confirm step.
  5. siriappintentsd executes the App Intent against Reminders.

A detail that falls out of reading the signature: yes, a reminder can be created directly under a subheadingsection: SectionEntity is a first-class parameter, and SectionEntity is itself defined as belonging to a ListEntity and holding child ReminderEntitys, so a Section is modelled as a subheading within a list. A Group, by contrast, is a folder of lists (GroupEntity); there’s no group parameter on the create intent, so a reminder attaches to a list (or a section), never to a group directly. (All fact from the metadata — though I read the declared signature, I didn’t execute a live request to confirm the planner fills section.)

This is also, incidentally, why there’s a “CLI-ish” way to drive Reminders now where there wasn’t before: these App Intents are discoverable (isDiscoverable: true) and runnable through the shortcuts command-line tool. The same declarations that let Siri call them let shortcuts run call them.

Why a catalog and retrieval, not a tool list

Here’s the first real divergence from how you’d wire up MCP. Claude Code puts your tools in the model’s context. Siri can’t: the ToolKit catalog aggregates App Intents, Shortcuts actions, legacy SiriKit intents, and Apple’s own first-party “flowTools” across every installed app — on the order of thousands of entries. You can’t put thousands of tool schemas in a prompt.

So the planner does retrieval-augmented generation over a vector database instead.

Tool retrieval. The catalog is written, read, and executed by three separate processes;

Figure 2 — Tool retrieval. The catalog is written, read, and executed by three separate processes; the planner embeds the query string server-side, searches the ToolRetrieval vector DB, and is hard-constrained to real catalog tool names. The store’s vector dimension is TCC-blocked; the materialised embedding model outputs [1,384] fp16, while the index template supports 256/512/768.

The stores live in group.com.apple.intelligenceflow: an EnumRetrieval index (638,976 bytes) and the larger, newer ToolRetrieval/v1_3 (950,272 bytes), both backed by a private VectorSearch.framework that exposes float16/float32 vectors, cosine/dot/L2 metrics, and an IVF index template instantiated for 256, 512, and 768 dimensions. The query path is ToolboxClient.query(plannerType:query:k:bundleIDs:) over the com.apple.intelligenceflow.toolbox XPC endpoint.

Crucially the query is embedded server-side: the utterance crosses as a string and is turned into a vector inside intelligenceflowd, using the one embedding model materialised on disk, SbertQuantizedEmbeddingModel.mlpackage, whose output is [1, 384] float16. (That the stored vectors are therefore 384-dimensional is inference — the database’s dimension column is TCC-blocked, and the IVF template supports three widths.) The retrieved tools are then handed to the model with hard guardrails baked into the strings: ' tool not in toolbox, aborting' and 'do not retry with alternative names. Check the Toolbox Catalog for correct tool names.' In other words, the model is constrained to the catalog’s real tool names — the same problem every function-calling system has with hallucinated tool names, solved by retrieval plus a name check.

The host, split across daemons

In Claude Code, one process is the host. Here the “host” is spread across several LaunchAgents, talking over XPC.

The live daemon spine on macOS 27 Golden Gate (build 26A5353q). assistantd brokers the s

Figure 3 — The live daemon spine on macOS 27 Golden Gate (build 26A5353q). assistantd brokers the session; intelligenceflowd is the agent loop; siriappintentsd executes. Solid edges are entitlement-backed (a vend/client pair exists); dashed edges are inferred wire calls that were not traced.

assistantd (pid 1050) is the front door. dyld_info shows it linking 104 dylibs — SiriMessageBus, DialogEngine, SiriKitInvocation — but not IntelligenceFlowRuntime. Its vmmap is the giveaway: the IntelligenceFlow client framework is mapped executable (r-x), while IntelligenceFlowRuntime appears only as a 6 KB unused copy-on-write page. It holds the client API and brokers; it does not host the agent.

intelligenceflowd (pid 1064) is the agent. Here IntelligenceFlowRuntime is genuinely code-mapped (~12.3 MB r-x), alongside FoundationModels. The loop is a class called IntelligenceFlowRunner, whose Options read exactly like an agent harness — maxTurns, maxToolCallsPerTurn, processTimeout, planOnly, autoConfirm — and which emits 'Planner generated tool call ' and guards 'Hit max repeated tool calls for '. It builds its executor with makeExecutor(sessionId:toolbox:plannerToolbox:actionValidator:…). (SiriOrchestration.framework, despite the promising name, is an empty stub — __text size 0 — so the orchestration really does live here.)

Two more processes round out the spine: intelligencecontextd (pid 1084) vends on-screen and personal context, and siriappintentsd (pid 1315) is the App Intents executor. The edges between them are drawn from matching vend/client entitlement pairs, not traced traffic — solid in the diagram where an entitlement pair exists, dashed where the call itself is inferred.

One thing worth calling out because the MCP analogy can mislead: there’s real safety machinery in the loop that a local MCP setup usually lacks. intelligenceflowd carries feature flags for an ActionPoisoningClassifier and ActionPoisoningThrowOnDetection — prompt-injection / poisoned-action defence — plus BudgetAwareTokenManagement and an actionValidator wired into the executor. When the tools can send messages and place calls, the agent has to assume its own inputs are hostile.

Personal context: the other two stores

Two further stores supply the “personal context” pillar — the rough equivalent of resources an agent can read.

The two personal-context stores. Biome events materialise into the intelligenceplatformd

Figure 4 — The two personal-context stores. Biome events materialise into the intelligenceplatformd triple store (graph.db); globalKnowledge.db is fed by a server-side Apple KG proxy (Parsec/Pegasus). Separately, spotlightknowledged.updater builds a Kuzu HNSW semantic index over content with 512-dim FLOAT16 cosine embeddings.

The first is a personal knowledge graph in intelligenceplatformd (pid 1602). Its store under ~/Library/IntelligencePlatform/ is TCC-protected, but reconstructed from open file descriptors it’s a triple store: graph.db (18.7 MB) plus a larger globalKnowledge.db (50.5 MB) and a fan of artifact views, ~126.7 MB in all, fed from Biome event streams (Contacts, Photos, Siri, location). The API is a genuine triple store (GraphStore.tripleInsertingTransaction, EntityTriple/EventTriple). One nuance: globalKnowledge.db is server-backed — a DataActions.callPegasusProxy reaches an Apple knowledge-graph proxy returning Apple_Parsec_Kg_* results — so it’s not purely on-device personal data (inference, since the contents are blocked).

The second is the semantic Spotlight index built by spotlightknowledged.updater. I went in expecting an “ivf-512xfp16” file from an earlier lead — that lead is disproven; no such files exist. What’s actually there is an HNSW index built on Kuzu (an embedded graph DB wrapped as libhybriddatabase), whose per-class chunk tables declare embedding FLOAT16[512] indexed metric := 'cosine', across five content classes (calendar, general, mail, messages, siriTranscript). So 512-dimension fp16 cosine vectors are real — they belong to this content index, not the planner’s tool retrieval. This is the substrate WWDC’s session 246, “LLM search using Core Spotlight,” exposes to developers.

The model substrate

Underneath sits the inference stack. The public entry is FoundationModels.framework (SystemLanguageModel), and the static link chain runs app → FoundationModelsTokenGenerationModelManagerServicesmodelmanagerd (pid 789) → an on-device inference provider extension → {ODIE, Espresso, BNNS, MPSGraph, ANE}. modelmanagerd brokers those provider extensions rather than letting clients touch them directly. CoreAI.framework — WWDC’s “Meet Core AI” — turns out to be the public face of a private ODIE.framework (BuildAliasOf=OnDeviceInferenceEngine). I want to be precise: that chain is a static link graph, not a traced execution, and which backend or compute unit (ANE/GPU/CPU) a given request lands on is undetermined from disk.

The models themselves are 35 generative-model assets under the FM manifest tree — including 3b_lw_planner_v1_draft_generic, the planner head named in the table up top. The weights are cryptex-locked (each manifest dir holds only a SecureMobileAssetCryptex1Ticket.img4), so parameter counts and quantization are unverified from disk. Foundation Models evaluate as eligible on this machine (OS_ELIGIBILITY_DOMAIN_FOUNDATION_MODELS = ELIGIBLE).

And the load-bearing point on Gemini, since the press ran hard with it: a case-insensitive search of the entire FM manifest tree for gemini|google|gpt|openai returns zero matches. Every shipped model asset is Apple com_apple_fm_* — the AFM 3 family. Apple says the models were “custom-built in collaboration with Google” and that Private Cloud Compute was extended onto Google Cloud hardware — that is training collaboration and infrastructure, not inference. There is no Gemini code or model in the on-device path. The “$1B Gemini-powered Siri” / “1.2-trillion-parameter” framing is press, contradicted by the inference layer and stated by no Apple primary source.

So — is it all just App Intents?

Half yes, and the half that matters is “no.” For developers, App Intents (App Schemas) is the single public on-ramp: SiriKit is deprecated, there’s no MCP and no separate “Siri tool” API, and even Apple’s own first-party tools are normalised into the same App-Intent-shaped catalog. App Intents is the one tool contract — the RemindersAppIntents.framework with its 39 intents and 6 schema-typed actions is exactly what that looks like up close.

But the thing consuming those intents is brand-new and entirely Apple’s: an on-device LLM planner (IntelligenceFlowRunner on AFM 3 + lw_planner_v1), RAG tool retrieval over a vector database, a unified ToolKit registry, a first-party FlowTools layer, an action-poisoning safety pass, and the modelmanagerd/ODIE inference broker. None of that existed before; it’s just system-internal plumbing rather than a developer SDK. So it isn’t “Apple leaned harder on Intents.” It’s “Apple built a tool-calling agent, and chose App Intents as the one contract for everything it’s allowed to call.”

If you use Claude Code, the surprise isn’t how alien this is — it’s how familiar. Same loop, same tool schemas, same approval gates. The differences are the ones Apple’s constraints force: a static on-device catalog instead of remote servers, retrieval instead of a context dump, XPC instead of a wire protocol, and a safety classifier because the tools can touch your real messages and phone.

What I can’t prove

The honest limits of a no-root, TCC-bounded inspection:

  • No XPC wire was captured. Every cross-process edge is inferred from matching vend/client entitlements and active-connection counts, not traced traffic.
  • The Reminders section parameter is read from the declared metadata; I didn’t execute a live request to confirm the planner fills it reliably.
  • The planner’s vector dimension is TCC-blocked; 384-d is the likely value (it’s the only materialised model) but the stored dimension is unreadable.
  • Generative-model weights are cryptex-locked, so parameter counts and quantization are inference from asset names.
  • Live toolbox contents and counts are TCC-protected on this machine; the catalog shape is solid, current per-app counts are not.
  • macOS 26 baseline: I have no prior-release machine to diff against, so I describe what is present on Golden Gate rather than asserting what is strictly “new.”

Sources

  • All system specifics (pids, binary versions, vmmap/dyld_info, entitlements, SQLite stores, the Reminders App Intents metadata, cryptex tickets, eligibility) are first-hand from one machine running macOS 27 Golden Gate, build 26A5353q, inspected June 2026.
  • Apple Newsroom, WWDC 2026 (June 8, 2026): the “27” OS generation and “Siri AI.”
  • Apple WWDC 2026 developer sessions 240 (“Build intelligent Siri experiences with App Schemas”), 241, 242, 246 (“LLM search using Core Spotlight”), and 324 (“Meet Core AI”).
  • Apple Machine Learning Research, AFM 3 family overview (June 8, 2026): “custom-built in collaboration with Google” and the Private Cloud Compute infrastructure statements.
  • Secondary press, flagged inline: AppleInsider (Gemini not in the inference path). The “$1B / 1.2-trillion-parameter Gemini-powered Siri” framing is cited only to mark it refuted.
Archive