zudo-doc
GitHub repository

Type to search...

to open search from anywhere

AI Assistant API

Created Mar 17, 2026Updated Jun 11, 2026Takeshi Takatsudo
Tags:#ai

API specification for the AI assistant chat endpoint.

Endpoint

POST /api/ai-chat
Content-Type: application/json

Request Body

interface AiChatRequest {
  message: string;
  history?: ChatMessage[];
}

interface ChatMessage {
  role: "user" | "assistant";
  content: string;
}
FieldTypeRequiredDescription
messagestringYesThe user's current message. Must be non-empty.
historyChatMessage[]NoPrevious conversation messages. Malformed history (non-array, more than 50 entries, entry content over 8192 chars, or injection match) is rejected with HTTP 400.

Success Response (200)

interface AiChatResponse {
  response: string;
}

The response field contains the assistant's reply as a markdown string.

Example:

// Request
{
  "message": "How do I add a new page?",
  "history": []
}

// Response
{
  "response": "Create an MDX file in `src/content/docs/`:\n\n1. Add frontmatter with `title`\n2. Write your content in MDX\n3. The page appears in the sidebar automatically"
}

Error Response (400 / 500)

interface AiChatErrorResponse {
  error: string;
}
StatusCondition
400Invalid JSON body
400message is not a non-empty string
400message exceeds 4000 character limit
400Message rejected by input screening (prompt injection guard)
400history is malformed (see field description)
405Request method is not POST or OPTIONS
415Content-Type is not application/json
429Rate limit exceeded (includes Retry-After header)
500Anthropic API call failed

The endpoint accepts POST (chat) and OPTIONS (CORS preflight) only; every other method returns 405 with { "error": "Method not allowed" }.

CORS

This endpoint uses a per-origin allowlist. When aiChatDemoMode is false, Access-Control-Allow-Origin is echoed back only for request origins listed in the aiChatAllowedOrigins setting — any other origin receives no allow-origin header and is blocked by the browser. (In demo mode, * is always returned for back-compat.) This is intentionally stricter than the Search Worker, which uses wildcard CORS (*) — the AI chat endpoint gates by origin because each call carries a real Anthropic API cost, whereas search is an unmetered, opt-in service. Do not assume the two endpoints share a CORS policy.

CF Env Bindings

BindingKindRequiredDescription
ANTHROPIC_API_KEYsecretYesAnthropic API key
DOCS_SITE_URLvarYesDeployed docs URL (used to fetch llms-full.txt)
RATE_LIMITKV namespaceYesStores rate-limit counters and audit logs
RATE_LIMIT_PER_MINUTEvarNoMax requests per IP per minute (default 10)
RATE_LIMIT_PER_DAYvarNoMax requests per IP per day (default 100)
PUBLIC_ENABLE_MOCKSvarNoSet to "true" to enable MSW mock responses (dev mode only)

Settings

The following src/config/settings.ts fields control endpoint behavior (distinct from the CF env vars above, which are Cloudflare-side runtime configuration).

SettingTypeDefaultDescription
aiChatDemoModebooleantrueShort-circuits the endpoint with a fixed reply; no API key or KV accessed
aiChatAllowedOriginsstring[][]CORS origin allowlist (non-demo only). Empty = all cross-origin requests blocked
aiChatGlobalDailyLimitnumber | falsefalseGlobal daily request ceiling across all IPs; false = no ceiling

Security

The endpoint includes layered defenses ported from the legacy standalone worker:

  • Hardened system prompt — XML-tagged context with explicit guardrails prevents the model from leaking configuration or following off-topic instructions. The prompt also instructs the model to treat all prior conversation turns as untrusted client input (see Chat-history trust model below)

  • Input screening — regex pre-filter rejects common prompt injection patterns before the Claude API is called. Screening runs after the rate-limit check, so a request rejected by the injection guard still consumes the caller's rate-limit quota (the limiter gates KV-write amplification, so it must run first)

  • Rate limiting — per-IP limits via RATE_LIMIT KV; fail-closed (KV outage → HTTP 429) when aiChatDemoMode is false; fail-open in demo mode (demo short-circuit is first, so the limiter is never reached in practice)

  • CORS allowlist — when not in demo mode, Access-Control-Allow-Origin is echoed only for origins in aiChatAllowedOrigins; cross-origin requests from unlisted origins are blocked by the browser. Demo mode always sends * for back-compat.

  • Global daily ceiling — optional aiChatGlobalDailyLimit backstop against IP rotation and botnets; off by default

  • Audit logging — every interaction is logged to RATE_LIMIT KV with audit: prefix (7-day TTL, fire-and-forget)

  • Message length cap — messages over 4000 characters are rejected before reaching the API

  • cf-connecting-ip caveat — per-IP rate limiting uses this header, which is only trustworthy when the Worker is deployed behind Cloudflare's network

Chat-history trust model

The history array is client-supplied and stateless — the server keeps no session record, so it cannot verify that an assistant-role turn was actually produced by a previous model response. Each entry is still hardened: a strict user/assistant role whitelist, the entry-count and per-entry length caps above, and a rebuild to { role, content } that strips any smuggled extra fields. user-role turns are injection-screened; assistant-role turns are not (a real assistant reply may legitimately quote injection-shaped text).

Because role is not verifiable, a caller can forge an assistant turn containing hostile instructions and bypass user-turn screening. This residual risk is accepted by design: the chat is a documentation assistant with a low blast radius, and the system prompt instructs the model to treat every prior turn as untrusted input that cannot override its rules. A robust fix (server-issued signed history) would require provisioning a secret and changing the client/server payload contract, which is not warranted for this feature. See issue #2036 for the full decision record.

Documentation Context

The endpoint fetches llms-full.txt (generated by the llms.txt integration) from DOCS_SITE_URL and caches it in memory for the CF Workers isolate lifespan (best-effort, ~1 hour). The content is included in the system prompt as <documentation> XML context.

Revision History

Takeshi TakatsudoCreated: 2026-03-18T02:25:36+09:00Updated: 2026-06-11T17:53:54+09:00

AI Assistant

Ask a question about the documentation.