If you’ve watched Claude Code call a tool (read a file, hit an MCP server, run a command), you already know the shape of the new Siri. Apple introduced “Siri AI” at WWDC 2026 alongside macOS 27 “Golden Gate,” and under the marketing it’s a fairly ordinary tool-calling agent: a language model that plans, picks tools from a catalog, fills in their arguments, asks you to confirm the risky ones, and runs them. Apple built that loop entirely in-house, chose App Intents as its one tool contract, and made some different engineering choices than the MCP world did.
Everything below comes from first-hand forensic inspection of one Mac running the Golden Gate beta, build 26A5353q, then cross-checked against Apple’s WWDC sessions and Newsroom posts. I separate two kinds of statement throughout. A fact means I reproduced an artifact: a binary, a vmmap mapping, a SQLite store, an entitlement, a metadata file. An inference means I reasoned from names, linkage, or correlation but never captured the live edge. The seams between processes are where most of the inference lives. The unified log hands you connection-level XPC events without root, but not what travels over them. I flag each one.
The mental model, mapped
Here’s it against Claude Code / MCP:
| Claude Code / MCP | The new Siri on macOS |
|---|---|
| The model (Claude) that plans and calls tools | An on-device model (AFM 3) with a dedicated lw_planner_v1 planner adapter |
| An MCP server you add | An installed app that declares App Intents |
| A tool definition (name + JSON schema) | An App Intent (e.g. CreateReminderAppIntent) with typed parameters, optionally tagged with a standard App Schema |
tools/list, the catalog the model sees | The ToolKit catalog (Tools-prod SQLite), aggregated across every installed app |
| Resources / lookups the tool references | App Entities + Entity Queries (ReminderEntity, SectionEntityQuery) |
| The tool call the model emits | The planner emits a call by pythonName, executed by siriappintentsd / siriactionsd |
| The approval prompt before a write | authenticationPolicy + built-in confirm / askUser primitives |
| The MCP client/host (Claude Code itself) | assistantd (front-door broker) + intelligenceflowd (the agent loop) |
| The transport (stdio / HTTP / JSON-RPC) | XPC against a static on-device catalog (no wire protocol, no remote servers) |
| Putting every tool in the prompt | RAG: retrieve the relevant tools from a vector database, per request |
Two things to flag: there is no MCP anywhere in Siri’s path. Apple built an equivalent with its own proprietary plumbing. (The MCP you may have seen in Xcode 27 is a developer-tools bridge for coding agents; unrelated.) And the model itself is genuinely new and entirely internal, but the contract it calls through is the same App Intents API third-party developers already use.
A worked example: “add milk to the Errands section”
The cleanest way to see the tool contract is to read one. On Golden Gate, Reminders ships a dedicated RemindersAppIntents.framework whose Metadata.appintents declares 39 App Intents, 13 entities, and 10 entity queries. Five of those intents are tagged with Apple’s official reminders App Schema domain, the ones the planner is trained to recognise deeply: CreateReminderIntent, UpdateReminderIntent, DeleteRemindersIntent, CreateListIntent, CreateSectionIntent. (This is fact, read from the on-disk metadata.)
TTRCreateReminderAppIntent is, in MCP terms, a tool definition. Stripped to its shape:
tool reminders/CreateReminderIntent # App Schema domain "reminders", v1.0.0
from RemindersAppIntents.framework # the "server" = the app
desc "Creates a new reminder."
auth none # authenticationPolicy 0, no unlock to create
params
title String
dueDate Date
list ListEntity # resolved by ListEntityQuery
section SectionEntity # resolved by SectionEntityQuery
parentReminder ReminderEntity # i.e. make it a subtask
locationTrigger LocationTriggerEntity
priorityLevel PriorityLevel (enum)
isFlagged, isAllDay, notes, tags[], urls[],
subtasks[], recurrence, assignedPerson, contactPerson, ...
That is a JSON-schema-shaped tool, no different in spirit from one an MCP server would advertise: typed scalars, enums, arrays, and entity-typed parameters that point at things the model has to resolve first. The entity queries (SectionEntityQuery, ListEntityQuery) are the resolvers: “the Errands section” becomes a concrete SectionEntity the call can take.
So when you say “add milk to the Errands section of my Shopping list when I get home,” the loop is recognisably an agent doing function-calling:
Figure 1. A single request (‘add milk to the Errands section when I get home’) as an agent tool call. Amber steps were captured live (unified-log XPC connection events, the Biome transcript); dashed ones remain inferred from entitlements.
- A Siri UI surface hands the utterance to
assistantd, which brokers the session. assistant_cdmdparses the natural language into a structured intent (its CDM pipeline links theEspressoneural runtime).intelligenceflowdplans: it retrieves thereminders/CreateReminderIntenttool, resolvesShopping→ListEntityandErrands→SectionEntityvia their queries, and bindslocationTriggerto home.- It fills
title: "milk",list,section,locationTrigger, and, becauseauthenticationPolicyhere is none, runs it without a confirm step. (Steps 3–4 follow the declared contract; whether the live planner can actually bindsectionis shakier, see below.) siriappintentsdexecutes the App Intent against Reminders.
A detail that falls out of reading the signature: yes, a reminder can be created directly under a subheading. section: SectionEntity is a first-class parameter, and SectionEntity is itself defined as belonging to a ListEntity and holding child ReminderEntitys, so a Section is modelled as a subheading within a list. A Group, by contrast, is a folder of lists (GroupEntity); there’s no group parameter on the create intent, so a reminder attaches to a list (or a section), never to a group directly. (The signature is all fact from the metadata. But a captured planner transcript, Biome’s IntelligenceFlow.Transcript stream, complicates it: the live planner is handed a simplified manage_reminder wrapper exposing titles and a plain-text target_list_name, no section anywhere. So on this build the planner most likely can’t bind a SectionEntity even though shortcuts run can. Absence in one capture isn’t proof of impossibility, though.)
This is also, incidentally, why there’s a “CLI-ish” way to drive Reminders now where there wasn’t before: these App Intents are discoverable (isDiscoverable: true) and runnable through the shortcuts command-line tool. The same declarations that let Siri call them let shortcuts run call them.
Why a catalog and retrieval, not a tool list
Here’s the first real divergence from how you’d wire up MCP. Claude Code puts your tools in the model’s context. Siri can’t: the ToolKit catalog aggregates App Intents, Shortcuts actions, legacy SiriKit intents, and Apple’s own first-party “flowTools” across every installed app, 2,141 entries from 96 containers on this machine (1,591 App Intents, 387 Shortcuts actions, 111 legacy SiriKit intents, 52 flowTools; System Settings alone contributes 540, and third-party apps like Ghostty and Tailscale sit in the same catalog). You can’t put two thousand tool schemas in a prompt.
So the planner does retrieval-augmented generation over a vector database instead.
Figure 2. Tool retrieval. The catalog is written, read, and executed by three separate processes; the planner embeds the query string server-side, searches the ToolRetrieval vector DB, and is hard-constrained to real catalog tool names. The store’s metadata table declares dimension = 384, fp16, cosine, matching the materialised embedding model’s [1,384] fp16 output.
The stores live in group.com.apple.intelligenceflow: an EnumRetrieval index (638,976 bytes) and the larger, newer ToolRetrieval/v1_3 (950,272 bytes), both backed by a private VectorSearch.framework that exposes float16/float32 vectors, cosine/dot/L2 metrics, and an IVF index template instantiated for 256, 512, and 768 dimensions. The query path runs over the com.apple.intelligenceflow.toolbox XPC endpoint; the demangled retrieval method is ToolboxClient.enumQuery(plannerType:query:enumType:appBundleID:), alongside listTools(plannerType:toolType:registryType:).
The query is embedded server-side: the utterance crosses as a string and is turned into a vector inside intelligenceflowd, using the one embedding model materialised on disk, SbertQuantizedEmbeddingModel.mlpackage, whose output is [1, 384] float16. (The stored vectors are 384-dimensional, fact, read straight from the store’s dimension column; the sibling EnumRetrieval index uses 512, so the IVF template’s widths are genuinely instantiated per-store.) The retrieved tools are then handed to the model with hard guardrails baked into the strings: ' tool not in toolbox, aborting' and 'do not retry with alternative names. Check the Toolbox Catalog for correct tool names.' The model is constrained to the catalog’s real tool names. It’s the same problem every function-calling system has with hallucinated tool names, solved by retrieval plus a name check.
The host, split across daemons
In Claude Code, one process is the host. Here the “host” is spread across several LaunchAgents, talking over XPC.
Figure 3. The live daemon spine on macOS 27 Golden Gate (build 26A5353q). assistantd brokers the session; intelligenceflowd is the agent loop; siriappintentsd executes. Solid edges were traced live or are entitlement-backed; dashed edges remain inferred. The assistantd → intelligenceflowd and intelligenceflowd → intelligencecontextd connections were captured in the unified log’s XPC connection events.
assistantd (pid 1050) is the front door. dyld_info shows it linking 104 dylibs (SiriMessageBus, DialogEngine, SiriKitInvocation) but not IntelligenceFlowRuntime. Its vmmap is the giveaway: the IntelligenceFlow client framework is mapped executable (r-x), while IntelligenceFlowRuntime appears only as a 6 KB unused copy-on-write page. It holds the client API and brokers; it does not host the agent.
intelligenceflowd (pid 1064) is the agent. Here IntelligenceFlowRuntime is genuinely code-mapped (~12.3 MB r-x), alongside FoundationModels. The loop is a class called IntelligenceFlowRunner, whose Options read exactly like an agent harness (maxTurns, maxToolCallsPerTurn, processTimeoutInMilliseconds, planOnly, autoConfirmationEnabled) and which emits 'Planner generated tool call ' and guards 'Hit max repeated tool calls for '. It builds its executor with makeExecutor(sessionId:toolbox:plannerToolbox:actionValidator:…). (SiriOrchestration.framework, despite the promising name, is an empty stub, __text size 0, so the orchestration really does live here.)
Two more processes round out the spine: intelligencecontextd vends on-screen and personal context, and siriappintentsd (pid 1315) is the App Intents executor. The edges between them are drawn three ways in the diagram: amber where the connection itself was captured in the unified log’s com.apple.xpc:connection events (assistantd → intelligenceflowd, and intelligenceflowd → intelligencecontextd), solid where a vend/client entitlement pair exists, dashed where the call is only inferred. The intelligenceflowd → siriappintentsd edge never produced a connection event in 24 hours of looking, so it stays entitlement-only; message contents on every edge remain untraced.
One thing the MCP analogy can mislead you on: there’s real safety machinery in the loop that a local MCP setup usually lacks. intelligenceflowd carries feature flags for an ActionPoisoningClassifier and ActionPoisoningThrowOnDetection (prompt-injection / poisoned-action defence), plus BudgetAwareTokenManagement and an actionValidator wired into the executor. When the tools can send messages and place calls, the agent has to assume its own inputs are hostile.
Personal context: the other two stores
Two further stores supply the “personal context” pillar, the rough equivalent of resources an agent can read.
Figure 4. The two personal-context stores. Biome events materialise into the intelligenceplatformd triple store (graph.db); globalKnowledge.db is fed by a server-side Apple KG proxy (Parsec/Pegasus). Separately, spotlightknowledged.updater builds a Kuzu HNSW semantic index over content with 512-dim FLOAT16 cosine embeddings.
The first is a personal knowledge graph in intelligenceplatformd (pid 1602). Its store under ~/Library/IntelligencePlatform/ turned out to be plain user-readable: a triple store. graph.db (18.7 MB) plus a larger globalKnowledge.db (50.5 MB) and a fan of artifact views, ~126.7 MB in all, fed from Biome event streams (Contacts, Photos, Siri, location). The API is a genuine triple store (GraphStore.tripleInsertingTransaction, EntityTriple/EventTriple). One nuance: globalKnowledge.db is server-backed. A DataActions.callPegasusProxy reaches an Apple knowledge-graph proxy returning Apple_Parsec_Kg_* results, so it’s not purely on-device personal data. Reading it confirms as much (fact): its static_graph holds ~6,600 Apple-KG world-knowledge triples (Wikidata Q-IDs, media and sports entities) fed by downloaded subgraph assets, while live_graph and idsearch_no_result are TTL-stamped caches for live Pegasus IDSearch proxy results, pruned on a GlobalKnowledgeTwoHourPrune schedule.
The second is the semantic Spotlight index built by spotlightknowledged.updater. I went in expecting an “ivf-512xfp16” file from an earlier lead; that lead is disproven, no such files exist. What’s actually there is an HNSW index built on Kuzu (an embedded graph DB wrapped as libhybriddatabase), whose per-class chunk tables declare embedding FLOAT16[512] indexed metric := 'cosine', across five content classes (calendar, general, mail, messages, siriTranscript). So 512-dimension fp16 cosine vectors are real, and they belong to this content index, not the planner’s tool retrieval. This is the substrate WWDC’s session 246, “LLM search using Core Spotlight,” exposes to developers.
The model substrate
Underneath sits the inference stack. The public entry is FoundationModels.framework (SystemLanguageModel), and the static link chain runs app → FoundationModels → TokenGeneration → ModelManagerServices → modelmanagerd (pid 789) → an on-device inference provider extension → {ODIE, Espresso, BNNS, MPSGraph, ANE}. modelmanagerd brokers those provider extensions rather than letting clients touch them directly. CoreAI.framework (WWDC’s “Meet Core AI”) turns out to be the public face of a private ODIE.framework (BuildAliasOf=OnDeviceInferenceEngine). That chain is a static link graph, but a live run closes the loop: during a Siri request on this machine, the unified log shows TGOnDeviceInferenceProviderService loading com.apple.fm.language.instruct_300m.action_validator in the same instant the kernel powered on the Neural Engine (AppleH16ANEInterface: ANE_HWDevicePowerOn) and aned created an ANE program for a client task named TGOnDeviceInfere…. The small on-device models land on the ANE, not GPU or CPU. The instruct_server_v1/v2 models never touch local silicon at all: PrivateMLClientInferenceProviderService ships those requests over XPC to com.apple.privatecloudcompute.
The models themselves are 35 generative-model assets under the FM manifest tree, including 3b_lw_planner_v1_draft_generic, the planner head named in the table up top. The weights are cryptex-locked (each manifest dir holds only a SecureMobileAssetCryptex1Ticket.img4), so parameter counts and quantization are unverified from disk. Foundation Models evaluate as eligible on this machine (OS_ELIGIBILITY_DOMAIN_FOUNDATION_MODELS = ELIGIBLE).
And the load-bearing point on Gemini, since the press ran hard with it: a case-insensitive search of the entire FM manifest tree for gemini|google|gpt|openai returns zero matches. Every shipped model asset is Apple com_apple_fm_*, the AFM 3 family. Apple says the models were “custom-built in collaboration with Google” and that Private Cloud Compute was extended onto Google Cloud hardware: that is training collaboration and infrastructure, not inference. There is no Gemini code or model in the on-device path. The “$1B Gemini-powered Siri” / “1.2-trillion-parameter” framing is press, contradicted by the inference layer and stated by no Apple primary source.
So, is it all just App Intents?
Half yes, and the half that matters is “no.” For developers, App Intents (App Schemas) is the single public on-ramp: SiriKit is deprecated, there’s no MCP and no separate “Siri tool” API, and even Apple’s own first-party tools are normalised into the same App-Intent-shaped catalog. App Intents is the one tool contract, and the RemindersAppIntents.framework with its 39 intents and 5 schema-typed actions is exactly what that looks like up close.
But the thing consuming those intents is entirely Apple’s: an on-device LLM planner (IntelligenceFlowRunner on AFM 3 + lw_planner_v1), RAG tool retrieval over a vector database, a unified ToolKit registry, a first-party FlowTools layer, an action-poisoning safety pass, and the modelmanagerd/ODIE inference broker. Some of that plumbing has been quietly accreting: the macOS 26.5 SDK already stubs intelligenceflowd, modelmanagerd, and most of the Toolbox symbols. But the retrieval path (ToolRetrieval) and the conversation runtime are new in 27, and none of it is a developer SDK. So Apple didn’t just lean harder on Intents. They built a tool-calling agent and made App Intents the one contract for everything it’s allowed to call.
If you use Claude Code, almost none of this will feel new. It’s the loop you already run every day, down to the approval prompt before a risky tool call. The differences are the ones Apple’s constraints force: a static on-device catalog instead of remote servers, retrieval instead of a context dump, XPC instead of a wire protocol, and a safety classifier because the tools can touch your real messages and phone.
All of the above is a first-pass, black-box poke at one macOS 27 beta: no root, no special access, and where a seam stayed shut I’ve said so inline.
Sources
- All system specifics (pids, binary versions,
vmmap/dyld_info, entitlements, SQLite stores, the Reminders App Intents metadata, cryptex tickets, eligibility) are first-hand from one machine running macOS 27 Golden Gate, build26A5353q, inspected June 2026. - Apple Newsroom, WWDC 2026 (June 8, 2026): the “27” OS generation and “Siri AI.”
- Apple WWDC 2026 developer sessions 240 (“Build intelligent Siri experiences with App Schemas”), 241, 242, 246 (“LLM search using Core Spotlight”), and 324 (“Meet Core AI”).
- Apple Machine Learning Research, AFM 3 family overview (June 8, 2026): “custom-built in collaboration with Google” and the Private Cloud Compute infrastructure statements.
- Secondary press, flagged inline: AppleInsider (Gemini not in the inference path). The “$1B / 1.2-trillion-parameter Gemini-powered Siri” framing is cited only to mark it refuted.