AI Assistant API
API specification for the AI assistant chat endpoint.
Endpoint
POST / api/ ai- chat
Content- Type: application/ jsonRequest Body
interface AiChatRequest {
message: string;
history?: ChatMessage[];
}
interface ChatMessage {
role: "user" | "assistant";
content: string;
}| Field | Type | Required | Description |
|---|---|---|---|
message | string | Yes | The user's current message. Must be non-empty. |
history | ChatMessage[] | No | Previous conversation messages. Malformed history (non-array, more than 50 entries, entry content over 8192 chars, or injection match) is rejected with HTTP 400. |
Success Response (200)
interface AiChatResponse {
response: string;
}The response field contains the assistant's reply as a markdown string.
Example:
// Request
{
"message": "How do I add a new page?",
"history": []
}
// Response
{
"response": "Create an MDX file in `src/content/docs/`:\n\n1. Add frontmatter with `title`\n2. Write your content in MDX\n3. The page appears in the sidebar automatically"
}Error Response (400 / 500)
interface AiChatErrorResponse {
error: string;
}| Status | Condition |
|---|---|
| 400 | Invalid JSON body |
| 400 | message is not a non-empty string |
| 400 | message exceeds 4000 character limit |
| 400 | Message rejected by input screening (prompt injection guard) |
| 400 | history is malformed (see field description) |
| 405 | Request method is not POST or OPTIONS |
| 415 | Content-Type is not application/json |
| 429 | Rate limit exceeded (includes Retry-After header) |
| 500 | Anthropic API call failed |
The endpoint accepts POST (chat) and OPTIONS (CORS preflight) only; every other method returns 405 with { "error": "Method not allowed" }.
CORS
This endpoint uses a per-origin allowlist. When aiChatDemoMode is false, Access-Control-Allow-Origin is echoed back only for request origins listed in the aiChatAllowedOrigins setting — any other origin receives no allow-origin header and is blocked by the browser. (In demo mode, * is always returned for back-compat.) This is intentionally stricter than the Search Worker, which uses wildcard CORS (*) — the AI chat endpoint gates by origin because each call carries a real Anthropic API cost, whereas search is an unmetered, opt-in service. Do not assume the two endpoints share a CORS policy.
CF Env Bindings
| Binding | Kind | Required | Description |
|---|---|---|---|
ANTHROPIC_API_KEY | secret | Yes | Anthropic API key |
DOCS_SITE_URL | var | Yes | Deployed docs URL (used to fetch llms-full.txt) |
RATE_LIMIT | KV namespace | Yes | Stores rate-limit counters and audit logs |
RATE_LIMIT_PER_MINUTE | var | No | Max requests per IP per minute (default 10) |
RATE_LIMIT_PER_DAY | var | No | Max requests per IP per day (default 100) |
PUBLIC_ENABLE_MOCKS | var | No | Set to "true" to enable MSW mock responses (dev mode only) |
Settings
The following src/ fields control endpoint behavior (distinct from the CF env
vars above, which are Cloudflare-side runtime configuration).
| Setting | Type | Default | Description |
|---|---|---|---|
aiChatDemoMode | boolean | true | Short-circuits the endpoint with a fixed reply; no API key or KV accessed |
aiChatAllowedOrigins | string[] | [] | CORS origin allowlist (non-demo only). Empty = all cross-origin requests blocked |
aiChatGlobalDailyLimit | number | false | false | Global daily request ceiling across all IPs; false = no ceiling |
Security
The endpoint includes layered defenses ported from the legacy standalone worker:
Hardened system prompt — XML-tagged context with explicit guardrails prevents the model from leaking configuration or following off-topic instructions. The prompt also instructs the model to treat all prior conversation turns as untrusted client input (see Chat-history trust model below)
Input screening — regex pre-filter rejects common prompt injection patterns before the Claude API is called. Screening runs after the rate-limit check, so a request rejected by the injection guard still consumes the caller's rate-limit quota (the limiter gates KV-write amplification, so it must run first)
Rate limiting — per-IP limits via
RATE_LIMITKV; fail-closed (KV outage → HTTP 429) whenaiChatDemoModeisfalse; fail-open in demo mode (demo short-circuit is first, so the limiter is never reached in practice)CORS allowlist — when not in demo mode,
Access-Control-Allow-Originis echoed only for origins inaiChatAllowedOrigins; cross-origin requests from unlisted origins are blocked by the browser. Demo mode always sends*for back-compat.Global daily ceiling — optional
aiChatGlobalDailyLimitbackstop against IP rotation and botnets; off by defaultAudit logging — every interaction is logged to
RATE_LIMITKV withaudit:prefix (7-day TTL, fire-and-forget)Message length cap — messages over 4000 characters are rejected before reaching the API
cf-connecting-ipcaveat — per-IP rate limiting uses this header, which is only trustworthy when the Worker is deployed behind Cloudflare's network
Chat-history trust model
The history array is client-supplied and stateless — the server keeps no session record, so
it cannot verify that an assistant-role turn was actually produced by a previous model response.
Each entry is still hardened: a strict user/assistant role whitelist, the entry-count and
per-entry length caps above, and a rebuild to { role, content } that strips any smuggled extra
fields. user-role turns are injection-screened; assistant-role turns are not (a real
assistant reply may legitimately quote injection-shaped text).
Because role is not verifiable, a caller can forge an assistant turn containing hostile
instructions and bypass user-turn screening. This residual risk is accepted by design: the
chat is a documentation assistant with a low blast radius, and the system prompt instructs the
model to treat every prior turn as untrusted input that cannot override its rules. A robust fix
(server-issued signed history) would require provisioning a secret and changing the client/server
payload contract, which is not warranted for this feature. See
issue #2036 for the full decision record.
Documentation Context
The endpoint fetches llms-full.txt (generated by the llms.txt integration) from DOCS_SITE_URL and caches it in memory for the CF Workers isolate lifespan (best-effort, ~1 hour). The content is included in the system prompt as <documentation> XML context.