How to use Apple Intelligence as a service

Issue #1035

Apple Intelligence is not just a brand. Since macOS 26 (Tahoe), it ships a developer-accessible framework called Foundation Models that gives your code direct access to the on-device language model powering Writing Tools and Siri. No API key, no cloud endpoint, no usage bill — inference runs entirely on Apple Silicon via the Neural Engine.

The catch is that Foundation Models is a Swift-only framework. If you work in TypeScript, Python, or any non-Swift stack, you have no direct path to it. That is the gap apple-intelligence-cli fills.

How it works

apple-intelligence-cli is a thin TypeScript wrapper around a compiled Swift binary called AppleBridge. The Swift binary owns all model interaction through Foundation Models. The two processes communicate via newline-delimited JSON over stdin/stdout.

apple-intelligence (TypeScript/Bun CLI)
        │  JSON over stdin/stdout
        ▼
  AppleBridge (Swift binary)
        │  FoundationModels framework
        ▼
  SystemLanguageModel (Apple Neural Engine — on-device, private)

You get two modes: prompt for quick terminal queries, and serve for a persistent OpenAI-compatible local server.

The prompt command is useful for one-off tasks: summarizing a file, asking a question, drafting copy. Because it streams by default, you see the response immediately.

apple-intelligence prompt "Explain async/await in Swift in two sentences"

Piping works well for code review or document processing:

cat MyFile.swift | apple-intelligence prompt "What does this code do?"
git diff | apple-intelligence prompt "Summarize these changes as a commit message"

As a local API service

The more powerful use case is running Apple Intelligence as a local server. Start it once:

apple-intelligence serve
# Listening on http://127.0.0.1:11434

It exposes POST /v1/chat/completions following the OpenAI API contract. Any code already using the OpenAI SDK works immediately — you only change the baseURL.

bun add openai

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:11434/v1",
  apiKey: "not-needed",
});

const response = await client.chat.completions.create({
  model: "apple-intelligence",
  messages: [
    { role: "user", content: "Summarize the history of computing in three sentences" },
  ],
});

console.log(response.choices[0].message.content);

This pattern is particularly useful for building private internal tools — a Slack bot that summarizes meeting notes, a CLI that reviews pull requests, or a local writing assistant — without any data leaving the machine.

Privacy

Every inference call stays on your machine. No prompts or responses are transmitted to Apple’s servers. This makes it practical for tasks involving sensitive content where cloud LLMs are not an option, at the cost of a smaller, less capable model.