AI Assistant API

Created Mar 17, 2026Updated Jun 11, 2026Takeshi Takatsudo

Tags:#ai

API specification for the AI assistant chat endpoint.

Endpoint

POST /api/ai-chat
Content-Type: application/json

Request Body

interface AiChatRequest {
  message: string;
  history?: ChatMessage[];
}

interface ChatMessage {
  role: "user" | "assistant";
  content: string;
}

Field	Type	Required	Description
`message`	`string`	Yes	The user's current message. Must be non-empty.
`history`	`ChatMessage[]`	No	Previous conversation messages. Malformed history (non-array, more than 50 entries, entry content over 8192 chars, or injection match) is rejected with HTTP 400.

Success Response (200)

interface AiChatResponse {
  response: string;
}

The response field contains the assistant's reply as a markdown string.

Example:

// Request
{
  "message": "How do I add a new page?",
  "history": []
}

// Response
{
  "response": "Create an MDX file in `src/content/docs/`:\n\n1. Add frontmatter with `title`\n2. Write your content in MDX\n3. The page appears in the sidebar automatically"
}

Error Response (400 / 500)

interface AiChatErrorResponse {
  error: string;
}

Status	Condition
400	Invalid JSON body
400	`message` is not a non-empty string
400	`message` exceeds 4000 character limit
400	Message rejected by input screening (prompt injection guard)
400	`history` is malformed (see field description)
405	Request method is not `POST` or `OPTIONS`
415	Content-Type is not application/json
429	Rate limit exceeded (includes `Retry-After` header)
500	Anthropic API call failed

The endpoint accepts POST (chat) and OPTIONS (CORS preflight) only; every other method returns 405 with { "error": "Method not allowed" }.

CORS

This endpoint uses a per-origin allowlist. When aiChatDemoMode is false, Access-Control-Allow-Origin is echoed back only for request origins listed in the aiChatAllowedOrigins setting — any other origin receives no allow-origin header and is blocked by the browser. (In demo mode, * is always returned for back-compat.) This is intentionally stricter than the Search Worker, which uses wildcard CORS (*) — the AI chat endpoint gates by origin because each call carries a real Anthropic API cost, whereas search is an unmetered, opt-in service. Do not assume the two endpoints share a CORS policy.

CF Env Bindings

Binding	Kind	Required	Description
`ANTHROPIC_API_KEY`	secret	Yes	Anthropic API key
`DOCS_SITE_URL`	var	Yes	Deployed docs URL (used to fetch `llms-full.txt`)
`RATE_LIMIT`	KV namespace	Yes	Stores rate-limit counters and audit logs
`RATE_LIMIT_PER_MINUTE`	var	No	Max requests per IP per minute (default `10`)
`RATE_LIMIT_PER_DAY`	var	No	Max requests per IP per day (default `100`)
`PUBLIC_ENABLE_MOCKS`	var	No	Set to `"true"` to enable MSW mock responses (dev mode only)

Settings

The following src/config/settings.ts fields control endpoint behavior (distinct from the CF env vars above, which are Cloudflare-side runtime configuration).

Setting	Type	Default	Description
`aiChatDemoMode`	`boolean`	`true`	Short-circuits the endpoint with a fixed reply; no API key or KV accessed
`aiChatAllowedOrigins`	`string[]`	`[]`	CORS origin allowlist (non-demo only). Empty = all cross-origin requests blocked
`aiChatGlobalDailyLimit`	`number \| false`	`false`	Global daily request ceiling across all IPs; `false` = no ceiling

Security

The endpoint includes layered defenses ported from the legacy standalone worker:

Hardened system prompt — XML-tagged context with explicit guardrails prevents the model from leaking configuration or following off-topic instructions. The prompt also instructs the model to treat all prior conversation turns as untrusted client input (see Chat-history trust model below)
Input screening — regex pre-filter rejects common prompt injection patterns before the Claude API is called. Screening runs after the rate-limit check, so a request rejected by the injection guard still consumes the caller's rate-limit quota (the limiter gates KV-write amplification, so it must run first)
Rate limiting — per-IP limits via RATE_LIMIT KV; fail-closed (KV outage → HTTP 429) when aiChatDemoMode is false; fail-open in demo mode (demo short-circuit is first, so the limiter is never reached in practice)
CORS allowlist — when not in demo mode, Access-Control-Allow-Origin is echoed only for origins in aiChatAllowedOrigins; cross-origin requests from unlisted origins are blocked by the browser. Demo mode always sends * for back-compat.
Global daily ceiling — optional aiChatGlobalDailyLimit backstop against IP rotation and botnets; off by default
Audit logging — every interaction is logged to RATE_LIMIT KV with audit: prefix (7-day TTL, fire-and-forget)
Message length cap — messages over 4000 characters are rejected before reaching the API
cf-connecting-ip caveat — per-IP rate limiting uses this header, which is only trustworthy when the Worker is deployed behind Cloudflare's network

Chat-history trust model

The history array is client-supplied and stateless — the server keeps no session record, so it cannot verify that an assistant-role turn was actually produced by a previous model response. Each entry is still hardened: a strict user/assistant role whitelist, the entry-count and per-entry length caps above, and a rebuild to { role, content } that strips any smuggled extra fields. user-role turns are injection-screened; assistant-role turns are not (a real assistant reply may legitimately quote injection-shaped text).

Because role is not verifiable, a caller can forge an assistant turn containing hostile instructions and bypass user-turn screening. This residual risk is accepted by design: the chat is a documentation assistant with a low blast radius, and the system prompt instructs the model to treat every prior turn as untrusted input that cannot override its rules. A robust fix (server-issued signed history) would require provisioning a secret and changing the client/server payload contract, which is not warranted for this feature. See issue #2036 for the full decision record.

Documentation Context

The endpoint fetches llms-full.txt (generated by the llms.txt integration) from DOCS_SITE_URL and caches it in memory for the CF Workers isolate lifespan (best-effort, ~1 hour). The content is included in the system prompt as <documentation> XML context.

View source on GitHub