llm402.ai API
Pay-per-request LLM inference. No accounts. No API keys. Just pay and prompt.
llm402.ai provides OpenAI-compatible endpoints gated by HTTP 402 micropayments. Send a request, get an invoice, pay it, re-send with proof of payment. Your prompt is processed by one of 400+ models across multiple providers.
Four payment rails are supported. Every 402 response includes all available options -- pick whichever works for your client:
| Protocol | Currency | Network | Header |
|---|---|---|---|
| L402 | Bitcoin (sats) | Lightning Network | WWW-Authenticate |
| x402 | USDC (stablecoin) | Base L2 (EIP-3009) | Payment-Required |
| Cashu | Bitcoin (sats) | Ecash tokens | X-Cashu |
| Balance | Bitcoin (sats) | Prepaid account | Authorization: Bearer |
Model Naming
Short names work for all models -- no provider prefix needed:
deepseek-v3.2,claude-sonnet-4.6,gpt-5.4- Full IDs also work:
deepseek/deepseek-v3.2,anthropic/claude-sonnet-4.6 - For auto-routing: use
"model": "auto"and readX-Route-Modelheader from the 402 response
Model in URL Path
All inference endpoints support specifying the model in the URL path instead of the request body:
/v1/chat/completions/deepseek-v3.2/v1/images/generations/FLUX.1-schnell/v1/videos/generations/kling-2.1-pro
If both URL path and body contain a model, the body model takes priority. The /v1/models endpoint returns all available model IDs.
Quick Start
x402 (USDC on Base)
Pay with USDC stablecoins. No BTC needed. No gas for the payer.
L402 (Bitcoin Lightning)
Pay with Bitcoin over the Lightning Network. Instant settlement, 21-sat minimum.
Endpoints
All inference endpoints are OpenAI-compatible. Base URL: https://llm402.ai
OpenAI-Compatible
| Method | Path | Description | Auth |
|---|---|---|---|
| POST | /v1/chat/completions/v1/chat/completions/{model} |
Chat completions (streaming + buffered). Model can be in URL path or request body. | L402 / x402 / Balance / Cashu |
| POST | /v1/embeddings |
Text embeddings (max 128 strings per batch, no streaming) | L402 / x402 / Balance / Cashu |
| POST | /v1/images/generations/v1/images/generations/{model} |
Image generation (synchronous, one image per request). Model can be in URL path or request body. | L402 / x402 / Balance / Cashu |
| POST | /v1/videos/v1/videos/generations/{model} |
Create a video generation job (async, returns job ID). Model can be in URL path or request body. | L402 / x402 / Balance / Cashu |
| GET | /v1/videos/{job_id} |
Poll video job status (no auth required) | None |
| POST | /v1/balance |
Prepaid balance: create, top up, check status | None / Balance |
| GET | /v1/models |
List all available models (OpenAI-compatible) | None (free) |
Ollama-Compatible
| Method | Path | Description | Auth |
|---|---|---|---|
| POST | /api/generate/{model} |
Text generation | L402 / x402 / Balance / Cashu |
| POST | /api/chat/{model} |
Chat | L402 / x402 / Balance / Cashu |
| GET | /api/tags |
Model catalog with pricing | None (free) |
Ollama Examples
# Chat via Ollama-compatible endpoint (model in path)
curl -s -X POST https://llm402.ai/api/chat/deepseek-v3.2 \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"hello"}],"stream":false}'
# Text generation via Ollama-compatible endpoint
curl -s -X POST https://llm402.ai/api/generate/deepseek-v3.2 \
-H "Content-Type: application/json" \
-d '{"prompt":"Explain Bitcoin in one sentence","stream":false}'
# List all models with pricing and endpoints
curl -s https://llm402.ai/api/tags | jq '.models[] | {name, price_sats}'
Utility
| Method | Path | Description | Auth |
|---|---|---|---|
| GET | /health |
Service health and status | None (free) |
| POST | /v1/estimate-cost |
Pre-authorization cost estimation | None (free) |
| POST | /api/invoice/status |
Poll Lightning invoice payment status | None (free) |
| GET | /.well-known/l402 |
L402 service discovery (agent-readable) | None (free) |
| GET | /.well-known/openapi.json |
OpenAPI 3.1.0 specification | None (free) |
Estimate Cost
Pre-authorize requests by checking the cost before paying. This endpoint is free and requires no authentication. Useful for MCP clients, agents, and budgeting.
curl -s -X POST https://llm402.ai/v1/estimate-cost \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "Explain quantum computing"}],
"max_tokens": 500,
"pref": "balanced"
}'
{
"model": "deepseek-v3.2",
"shortName": "deepseek-v3.2",
"category": "general_knowledge",
"confidence": 0.82,
"rc": 100,
"estimatedInputTokens": 8,
"estimatedOutputTokens": 500,
"costSats": 21,
"costUsd": 0.000152,
"btcPrice": 68000
}
Parameters
| Field | Required | Description |
|---|---|---|
messages | Yes | Array of message objects (same format as chat completions) |
model | No | Model name (short or full ID). If omitted or "auto", the server auto-routes. |
max_tokens | No | Requested output tokens (defaults to the model's default output token count) |
pref | No | Routing preference: quality, balanced, cost, speed |
max_cost | No | Maximum cost in sats (routes only to models within budget) |
Invoice Status
Poll the payment status of a Lightning invoice. Useful for wallet integrations that need to confirm payment before re-sending with the L402 header.
curl -s -X POST https://llm402.ai/api/invoice/status \
-H "Content-Type: application/json" \
-d '{
"payment_hash": "a1b2c3d4...64hex",
"macaroon": "AgEJ..."
}'
# Before payment: { "paid": false }
# After payment: { "paid": true, "preimage": "e5f6a7b8...64hex" }
Security: The macaroon field is required and must match the payment_hash. This prevents preimage theft by ensuring only the original invoice requester can poll for the preimage.
x402 Protocol (USDC)
x402 uses EIP-3009 TransferWithAuthorization for gasless USDC payments on Base. The payer signs an off-chain authorization; the server settles it on-chain.
Network and Asset
| Field | Value |
|---|---|
| Network | eip155:8453 (Base mainnet, chain ID 8453) |
| Asset | USDC 0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913 |
| Denomination | Atomic USDC (6 decimals: 1000000 = $1.00) |
Payment Flow
Step 1: Get the 402 Challenge
Send a normal inference request with no auth headers:
curl -s -D- -X POST https://llm402.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4.6",
"messages": [{"role": "user", "content": "Say hello."}],
"max_tokens": 50
}'
The server responds with HTTP 402. The response body contains all payment information:
{
"error": "Payment Required",
"description": "claude-sonnet-4.6 inference, pay-per-request over Lightning, USDC, or Cashu",
"price": 42,
"model": "claude-sonnet-4.6",
"provider": "llm402.ai",
"max_tokens": 50,
"estimated_input_tokens": 12,
"invoice": "lnbc420n...",
"macaroon": "AgEJ...",
"paymentHash": "a1b2c3d4e5f6...64hex",
"x402": {
"price_usd": "0.000305",
"network": "eip155:8453",
"address": "0xe05cf38...",
"asset": "0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913",
"scheme": "exact"
},
"cashu": {
"price_sats": 42,
"unit": "sat",
"description": "Send sat-denominated Cashu tokens in X-Cashu header. Any public mint accepted."
}
}
The response headers include all payment options:
HTTP/2 402
WWW-Authenticate: L402 macaroon="...", invoice="lnbc..."
Payment-Required: eyJzY2hlbWUiOiJleGFjdCIsIm5ldH...
Cache-Control: no-store
Payment-Required Header
Base64-encoded JSON in x402 v2 envelope format. Decode it and use accepts[0] for payment details:
{
"x402Version": 2,
"error": "Payment required",
"accepts": [
{
"scheme": "exact",
"network": "eip155:8453",
"amount": "3150",
"asset": "0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913",
"payTo": "0x...",
"maxTimeoutSeconds": 120,
"extra": {
"name": "USD Coin",
"version": "2"
}
}
],
"resource": {
"url": "/v1/chat/completions",
"description": "LLM inference",
"mimeType": "application/json"
},
"price": "$0.003150"
}
| Field | Description |
|---|---|
x402Version | Always 2 |
accepts | Array of payment options. Always use accepts[0] |
accepts[0].scheme | Always "exact" |
accepts[0].network | Always "eip155:8453" (Base mainnet) |
accepts[0].amount | Price in atomic USDC (6 decimals). "3150" = $0.003150 |
accepts[0].asset | USDC contract address on Base |
accepts[0].payTo | Server's wallet address (recipient) |
accepts[0].maxTimeoutSeconds | Maximum settlement time (120s) |
accepts[0].extra.name | EIP-712 domain name. Always "USD Coin" (not "USDC" -- that is testnet) |
accepts[0].extra.version | EIP-712 domain version. Always "2" |
price | Human-readable USD price (informational only, use accepts[0].amount for signing) |
Step 2: Sign the EIP-3009 Authorization
Build a TransferWithAuthorization signature using EIP-712 typed data.
EIP-712 Domain
const domain = {
name: "USD Coin", // from extra.name
version: "2", // from extra.version
chainId: 8453, // Base mainnet
verifyingContract: "0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913"
};
EIP-712 Types
const types = {
TransferWithAuthorization: [
{ name: "from", type: "address" },
{ name: "to", type: "address" },
{ name: "value", type: "uint256" },
{ name: "validAfter", type: "uint256" },
{ name: "validBefore", type: "uint256" },
{ name: "nonce", type: "bytes32" },
]
};
Authorization Message
const now = Math.floor(Date.now() / 1000);
const nonce = "0x" + crypto.randomBytes(32).toString("hex");
const opt = paymentRequired.accepts[0]; // always use accepts[0]
const message = {
from: walletAddress, // your address (payer)
to: opt.payTo, // from accepts[0]
value: BigInt(opt.amount),
validAfter: BigInt(now - 600), // 10 min ago (clock skew buffer)
validBefore: BigInt(now + 120), // 2 min from now
nonce: nonce,
};
Step 3: Build the Payment Payload
Construct the V2 payment payload and base64-encode it:
const opt = paymentRequired.accepts[0]; // always use accepts[0]
const payload = {
x402Version: 2,
resource: paymentRequired.resource,
accepted: {
scheme: opt.scheme,
network: opt.network,
amount: opt.amount,
asset: opt.asset,
payTo: opt.payTo,
maxTimeoutSeconds: opt.maxTimeoutSeconds,
extra: opt.extra
},
payload: {
signature: signature,
authorization: {
from: walletAddress,
to: opt.payTo,
value: opt.amount,
validAfter: (now - 600).toString(),
validBefore: (now + 120).toString(),
nonce: nonce
}
}
};
const paymentSignature = Buffer.from(JSON.stringify(payload)).toString("base64");
Step 4: Send the Paid Request
Re-send the same inference request with the Payment-Signature header:
curl -s -X POST https://llm402.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Payment-Signature: eyJ4NDAyVmVyc2lvbiI6Mix..." \
-d '{
"model": "claude-sonnet-4.6",
"messages": [{"role": "user", "content": "Say hello."}],
"max_tokens": 50
}'
Response (HTTP 200):
{
"id": "chatcmpl-...",
"object": "chat.completion",
"model": "claude-sonnet-4.6",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 4,
"completion_tokens": 10,
"total_tokens": 14
}
}
Auto-routing gotcha: If you use model: "auto", the server routes to a specific model and returns it in the X-Route-Model header. On the retry with payment, you must use the specific model from the 402 response -- not "auto" again. Otherwise the router may pick a different model with a different price.
CORS
The server allows cross-origin x402 requests:
Access-Control-Allow-Headers: Content-Type, Authorization, Payment-Signature, X-Cashu
Access-Control-Expose-Headers: X-Route-Model, X-Route-Category, Payment-Required, WWW-Authenticate, X-Cashu-Change
Nonce Replay Protection
Each signed authorization can only be used once. Replay protection is enforced both server-side and on-chain via EIP-3009 nonces.
L402 Protocol (Lightning)
L402 (formerly LSAT) combines HTTP 402 status codes with Lightning Network payments and macaroon-based authentication. It is the original payment protocol supported by llm402.ai.
Payment Flow
Curl Example
# Send a request with no auth -- get back a 402 with an invoice
curl -s -D- -X POST https://llm402.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-v3.2",
"messages": [{"role": "user", "content": "hi"}],
"max_tokens": 50
}'
# Response includes:
# WWW-Authenticate: L402 macaroon="AgEJ...", invoice="lnbc210n1pn..."
# Body: { "error": "Payment Required", "price": 21, "invoice": "lnbc...", "macaroon": "AgEJ...", ... }
# Pay the Lightning invoice with your wallet and get the preimage.
# Then resend the exact same request with the L402 Authorization header:
curl -s -X POST https://llm402.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: L402 AgEJbGxtNDAyLmFp...:a1b2c3d4e5f67890..." \
-d '{
"model": "deepseek-v3.2",
"messages": [{"role": "user", "content": "hi"}],
"max_tokens": 50
}'
# Response: HTTP 200 with chat completion
WWW-Authenticate Header
The 402 response includes a WWW-Authenticate header with two components:
WWW-Authenticate: L402 macaroon="AgELbGxt...", invoice="lnbc50n1pn..."
| Component | Description |
|---|---|
macaroon | Base64-encoded V2 TLV macaroon with embedded caveats. Bound to a specific payment hash. |
invoice | BOLT-11 Lightning invoice. Pay this to obtain the preimage. |
Macaroon Caveats
Each macaroon is bound with first-party caveats that restrict its use. The server verifies all caveats on the paid request and rejects any that fail (fail-closed):
| Caveat | Format | Description |
|---|---|---|
RequestPath | RequestPath = /v1/chat/completions | Restricts the macaroon to a specific API endpoint |
ExpiresAt | ExpiresAt = 1712345678 | Unix timestamp expiry (5 minutes from issuance) |
MaxTokens | MaxTokens = 256 | Maximum output tokens the request may use |
MaxInputChars | MaxInputChars = 1500 | Prevents input inflation after invoice issuance |
MaxInputTokens | MaxInputTokens = 400 | Prevents token-count gaming (chars pass but tokens are higher) |
NotBefore | NotBefore = 1712340000 | Prevents preimage replay after server restart |
MaxInputItems | MaxInputItems = 5 | Limits batch size for /v1/embeddings requests (no tolerance) |
Model | Model = claude-sonnet-4.6 | Binds the macaroon to a specific model (prevents cross-model bypass) |
Fail-closed design: Unrecognized caveats are rejected. This ensures future caveat additions don't accidentally pass on old server versions.
Authorization Header
After paying the Lightning invoice and receiving the preimage, send the authorization:
Authorization: L402 AgELbGxt...:abc123def456...
Format: L402 {base64_macaroon}:{hex_preimage}
The server verifies:
- The macaroon signature against the root key
- All caveats pass (path, expiry, tokens, model, etc.)
- The preimage hashes to the payment hash embedded in the macaroon identifier
- The preimage has not been used before (burn-on-success: preimage is only burned after successful inference)
Preimage preservation: If inference fails after payment verification, the preimage is NOT burned. You can retry with the same macaroon:preimage pair until the macaroon expires (5 minutes).
Balance Tokens (Prepaid)
Balance tokens let you prepay for multiple requests with a single Lightning payment or USDC transfer. Fund a balance once, then use Authorization: Bearer bal_... on any gated endpoint without per-request payment flows.
How It Works
Endpoints
Create Balance (Lightning)
Request a Lightning invoice to fund a new balance:
# Step 1: Request an invoice for 1000 sats
curl -s -X POST https://llm402.ai/v1/balance \
-H "Content-Type: application/json" \
-d '{"sats": 1000}'
# Response (402):
# { "payment_hash": "a1b2c3...", "invoice": "lnbc10u...", "sats": 1000, "expires_in": 600 }
Poll for Payment
After paying the invoice, poll with the payment hash to get your token:
# Step 2: Poll until paid
curl -s -X POST https://llm402.ai/v1/balance \
-H "Content-Type: application/json" \
-d '{"payment_hash": "a1b2c3d4e5f6...64hex"}'
# Before payment: { "paid": false }
# After payment: { "paid": true, "token": "bal_xxxx...", "sats": 1000, "expires_at": "2026-05-03T..." }
Check Balance
curl -s -X POST https://llm402.ai/v1/balance \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bal_xxxx..." \
-d '{"action": "status"}'
# Response:
# { "sats": 850, "expires_at": "2026-05-03T...", "total_spent": 150, "requests": 7 }
Top Up (Lightning)
Add sats to an existing balance:
# Get a top-up invoice
curl -s -X POST https://llm402.ai/v1/balance \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bal_xxxx..." \
-d '{"sats": 500}'
# Returns 402 with a new invoice. Pay it, then poll with payment_hash as above.
Fund with USDC (x402)
Fund a balance using USDC instead of Lightning. Include a Payment-Signature header with the signed x402 payload:
# Fund with USDC (sign an x402 payment for the equivalent USD amount)
curl -s -X POST https://llm402.ai/v1/balance \
-H "Content-Type: application/json" \
-H "Payment-Signature: eyJ4NDAyVmVyc2lvbiI6Mix..." \
-d '{"sats": 500}'
# Response: { "paid": true, "token": "bal_xxxx...", "sats": 500 }
Use Balance Token
Include the token as a Bearer auth on any gated endpoint:
curl -s -X POST https://llm402.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bal_xxxx..." \
-d '{
"model": "deepseek-v3.2",
"messages": [{"role": "user", "content": "Explain Lightning Network"}],
"max_tokens": 200
}'
Token Lifecycle
| Rule | Value |
|---|---|
| Inactivity TTL | 30 days (resets on each use) |
| Max lifetime | 90 days from creation |
| Max balance | 50,000 sats |
| Min deposit | 100 sats |
Top-ups reset the inactivity timer but do not extend the 90-day max lifetime. Plan deposits accordingly.
Cashu Tokens (Ecash)
Pay with Cashu ecash tokens -- instant, private Bitcoin micropayments with no Lightning channel required. Send tokens directly in the request header. If you overpay, the server returns change tokens.
How It Works
Request
Send a cashuB (v4) token in the X-Cashu header:
curl -s -X POST https://llm402.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Cashu: cashuBo2F0gaJha..." \
-d '{
"model": "deepseek-v3.2",
"messages": [{"role": "user", "content": "hello"}],
"max_tokens": 100
}'
Change Tokens
If the token value exceeds the model's price by 2 or more sats, the server returns change in the X-Cashu-Change response header:
HTTP/2 200
X-Cashu-Consumed: true
X-Cashu-Change: cashuBo2F0gaJha... # remaining sats as ecash token
The change token is a cashuB token you can use for future requests.
Constraints
| Rule | Value |
|---|---|
| Token format | cashuB (v4) only. Deprecated cashuA (v3) tokens are rejected. |
| Unit | Sat-denominated only (no USD or other units) |
| Max proofs | 20 per token (DoS prevention) |
| Streaming | Not supported. Cashu requires buffered responses to calculate change. Use "stream": false. |
| Change threshold | 2 sats minimum. Overpayment of 1 sat is absorbed (not worth the mint round-trip). |
| Change size limit | 8 KB. If the change token exceeds 8 KB, it is absorbed by the server. |
| Mint | Server-configured allowlist. HTTPS-only, no private IPs. The 402 response body's cashu.description field indicates the current policy. |
No 402 dance needed: Unlike L402 and x402, you can skip the initial 402 request if you already know the price. Just send the Cashu token directly -- the server verifies the token value covers the model's price.
MCP Server
llm402.ai provides a hosted Model Context Protocol (MCP) server. Connect from any MCP client — Claude Code, Claude Desktop, Cursor, or any tool that supports MCP. Six tools are available: text inference, image generation, video generation, model discovery, balance management, and funding.
Setup
bal_ token from the balance display.
2. Add to your MCP client configClaude Code (~/.claude.json):
{
"mcpServers": {
"llm402": {
"url": "https://llm402.ai/mcp",
"headers": {
"Authorization": "Bearer bal_YOUR_TOKEN_HERE"
}
}
}
}
Claude Desktop (claude_desktop_config.json):
{
"mcpServers": {
"llm402": {
"url": "https://llm402.ai/mcp",
"headers": {
"Authorization": "Bearer bal_YOUR_TOKEN_HERE"
}
}
}
}
Replace bal_YOUR_TOKEN_HERE with your actual balance token. That's it — the MCP client discovers all tools automatically.
Available Tools
| Tool | Auth | Description |
|---|---|---|
llm402_inference |
Required | Text inference. 335+ models, auto-routed by default. Supports system prompts, model selection, temperature, max_tokens, and routing preference (quality/balanced/cost/speed). |
llm402_image |
Required | Image generation. Requires a specific model ID (e.g. black-forest-labs/FLUX.1-schnell). Supports width, height, steps, seed, negative prompt. |
llm402_video |
Required | Video generation (async). Requires a specific model ID (e.g. Wan-AI/wan2.7-t2v). Supports seconds, width, height, fps. Polls for completion up to 90s, then returns job URL for manual polling. |
llm402_models |
None | List available models. Optional substring filter (e.g. "deepseek", "flux"). Free, no balance required. |
llm402_balance |
Required | Check your prepaid balance: remaining sats, total deposited, total spent, request count. |
llm402_fund |
Required | Generate a Lightning invoice to top up your balance. Default 5,000 sats. Polls for payment confirmation up to 45 seconds. |
Example: Text Inference
# Using curl against the MCP endpoint directly (JSON-RPC format)
curl -X POST https://llm402.ai/mcp \
-H "Content-Type: application/json" \
-H "Accept: application/json, text/event-stream" \
-H "Authorization: Bearer bal_YOUR_TOKEN" \
-d '{
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "llm402_inference",
"arguments": {
"prompt": "Explain quantum computing in one sentence.",
"max_tokens": 100
}
},
"id": 1
}'
Example: Image Generation
curl -X POST https://llm402.ai/mcp \
-H "Content-Type: application/json" \
-H "Accept: application/json, text/event-stream" \
-H "Authorization: Bearer bal_YOUR_TOKEN" \
-d '{
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "llm402_image",
"arguments": {
"prompt": "A cyberpunk cityscape at night with neon lights",
"model": "black-forest-labs/FLUX.1-schnell"
}
},
"id": 1
}'
OpenAI-Compatible Alternative
If your tool doesn't support MCP but accepts OPENAI_BASE_URL, the same balance token works directly:
export OPENAI_BASE_URL=https://llm402.ai/v1
export OPENAI_API_KEY=bal_YOUR_TOKEN
This works with Cursor, Aider, LangChain, the OpenAI Python SDK, and any tool that accepts a custom base URL.
Endpoint
MCP endpoint: https://llm402.ai/mcp
Protocol: Streamable HTTP (POST only, stateless). Responses are Server-Sent Events containing JSON-RPC results.
Streaming
Add "stream": true to your request body to receive Server-Sent Events (SSE) as tokens are generated. The format follows the OpenAI streaming specification.
Request
curl -s -N -X POST https://llm402.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bal_xxxx..." \
-d '{
"model": "deepseek-v3.2",
"messages": [{"role": "user", "content": "Count to 5"}],
"max_tokens": 100,
"stream": true
}'
Response Format
The server sends a series of data: lines. Each line is a JSON chunk with a delta object containing the next token(s):
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1712345678,"model":"deepseek-v3.2","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1712345678,"model":"deepseek-v3.2","choices":[{"index":0,"delta":{"content":"1"},"finish_reason":null}]}
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1712345678,"model":"deepseek-v3.2","choices":[{"index":0,"delta":{"content":", "},"finish_reason":null}]}
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1712345678,"model":"deepseek-v3.2","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
| Field | Description |
|---|---|
choices[0].delta.role | Sent on the first chunk only ("assistant") |
choices[0].delta.content | The next token(s) of the response |
choices[0].finish_reason | null while generating, "stop" on the final chunk |
data: [DONE] | End-of-stream marker. Close the connection after this line. |
Heartbeat: During long-running inferences, the server sends : heartbeat SSE comments every 15 seconds to keep the connection alive. These are not data lines and should be ignored by your parser.
Consuming the Stream
# Stream tokens to stdout (use -N to disable buffering)
curl -s -N -X POST https://llm402.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bal_xxxx..." \
-d '{"model":"deepseek-v3.2","messages":[{"role":"user","content":"hello"}],"max_tokens":100,"stream":true}' \
| while IFS= read -r line; do
echo "$line"
done
const res = await fetch('https://llm402.ai/v1/chat/completions', {
method: 'POST',
headers: { 'Content-Type': 'application/json', 'Authorization': 'Bearer bal_xxxx...' },
body: JSON.stringify({
model: 'deepseek-v3.2', messages: [{ role: 'user', content: 'hello' }],
max_tokens: 100, stream: true
})
});
const reader = res.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value, { stream: true });
for (const line of text.split('\n')) {
if (line.startsWith('data: ') && line !== 'data: [DONE]') {
const chunk = JSON.parse(line.slice(6));
const token = chunk.choices[0]?.delta?.content || '';
process.stdout.write(token);
}
}
}
import requests, json
res = requests.post('https://llm402.ai/v1/chat/completions',
headers={'Content-Type': 'application/json', 'Authorization': 'Bearer bal_xxxx...'},
json={'model': 'deepseek-v3.2', 'messages': [{'role': 'user', 'content': 'hello'}],
'max_tokens': 100, 'stream': True},
stream=True)
for line in res.iter_lines():
line = line.decode('utf-8')
if line.startswith('data: ') and line != 'data: [DONE]':
chunk = json.loads(line[6:])
token = chunk['choices'][0].get('delta', {}).get('content', '')
print(token, end='', flush=True)
Payment first: Streaming requires the same payment flow as buffered requests. Pay via L402, x402, or Balance token before sending a stream request. You cannot begin streaming before payment is verified. Cashu does not support streaming -- use buffered mode ("stream": false) with Cashu tokens.
Request Deduplication
Non-streaming responses are cached for 30 seconds. If you retry an identical request (same model, messages, max_tokens, and IP), the server returns the cached response immediately without re-running inference or re-charging you.
| Parameter | Value |
|---|---|
| TTL | 30 seconds |
| Max entries | 100 |
| Max entry size | 1 MB |
| Scope | Per-IP (different IPs get separate caches) |
The response includes an X-Dedup header indicating whether the response was served from cache:
X-Dedup: hit # served from cache (no charge)
X-Dedup: miss # fresh inference
To bypass the cache, send the X-No-Cache: true request header.
Web Search
Ground model responses with real-time web data. When enabled, the model searches the web before responding and includes citations to sources. Available on most models via auto-routing.
Usage
Add web_search: true to any /v1/chat/completions request:
curl -X POST https://llm402.ai/v1/chat/completions \
-H "Authorization: Bearer bal_YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [{"role": "user", "content": "What is the current price of Bitcoin?"}],
"web_search": true,
"stream": true
}'
Parameters
| Parameter | Type | Description |
|---|---|---|
web_search | boolean | Set to true to enable web search. Default: false. |
Response
The model embeds citation markers (e.g. [1]) in its response text with links to sources. Structured annotations may also be included in the response as annotations on the message object.
Pricing
Web search adds a small surcharge per request (approximately 35–50 sats) on top of the base inference cost. The surcharge covers up to 5 web page lookups per request. The exact cost is included in the 402 challenge invoice or deducted from your balance.
Compatibility
model: "auto"— always supported. The router selects a search-capable model.- Explicit model — supported on most models. Returns
400if the model does not support web search. - The
toolsparameter is not supported. Useweb_search: trueinstead.
Estimate Cost
Use /v1/estimate-cost with web_search: true to see the price before sending a paid request:
curl -X POST https://llm402.ai/v1/estimate-cost \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [{"role": "user", "content": "Latest AI news"}],
"web_search": true
}'
The response includes webSearchEnabled: true and the total cost with the search surcharge included.
Image Generation
Generate images from text prompts. 40+ models across three providers, all behind a unified OpenAI-compatible endpoint. All five payment rails are supported (L402, x402, Balance, Cashu, 402 challenge).
Endpoint
POST /v1/images/generations or /v1/images/generations/{model}
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes* | Image model ID (e.g. FLUX.1-schnell). *Not required if model is in URL path. |
prompt | string | Yes | Text description of the image (2-4096 chars) |
size | string | No | Dimensions as "WxH" string (e.g. "1024x1792"). Use "auto" for model default. Overrides width/height. |
width | integer | No | Image width in pixels (64–2048). Must provide both width and height together. |
height | integer | No | Image height in pixels (64–2048). Must provide both width and height together. |
steps | integer | No | Diffusion steps (1-50, default model-dependent) |
response_format | string | No | url (default) or b64_json |
seed | integer | No | Deterministic seed for reproducibility |
Response
{
"created": 1234567890,
"model": "black-forest-labs/FLUX.1-schnell",
"data": [
{ "url": "https://..." }
]
}
The url field contains either an HTTPS URL or a data: URI depending on the model. Both are valid image sources for <img> tags. HTTPS URLs expire after ~7 days.
Key Differences from Chat Completions
- No streaming — response is synchronous
modelis required (no auto-routing)- One image per request (
nis always 1) - Pricing is per-image, not per-token
- All payments are non-refundable
- Request deduplication is disabled
- Generation time varies: 1–10s for FLUX/diffusion models, 30–90s for GPT-5 Image models
- Dimensions are automatically rounded to the nearest multiple of 16 for compatibility
- Both
sizestring andwidth/heightinteger formats are accepted
Example
curl -X POST https://llm402.ai/v1/images/generations \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bal_your_token_here" \
-d '{
"model": "black-forest-labs/FLUX.1-schnell",
"prompt": "A serene mountain landscape at sunset"
}'
Available Models
40+ image models from three providers. Use /v1/models or the Models page for the full live list. Selected highlights:
| Model | Price | Notes |
|---|---|---|
black-forest-labs/FLUX.1-schnell | 21 sats | Fast, cheapest FLUX |
black-forest-labs/FLUX.2-pro | 63 sats | Professional quality |
black-forest-labs/FLUX.2-max | 147 sats | Maximum quality |
google/flash-image-2.5 | 82 sats | Nano Banana — Gemini image gen |
google/flash-image-3.1 | 105 sats | Nano Banana 2 — latest Gemini |
google/imagen-4.0-fast | 42 sats | Google Imagen |
openai/gpt-5-image-mini | 21 sats | GPT-5 image gen (compact) |
openai/gpt-5-image | 105 sats | GPT-5 image gen (full, slow ~60s) |
Bria/fibo | 84 sats | JSON-native, enterprise-safe |
ideogram/ideogram-3.0 | 126 sats | Strong text rendering |
Video Generation
Generate videos from text prompts. Unlike image generation, video generation is asynchronous: you create a job, then poll for completion. All four payment rails are supported (L402, x402, Balance, Cashu). Payment is collected when the job is created.
Workflow
- Create job —
POST /v1/videoswith your prompt and model. Returns202 Acceptedwith a job ID and poll URL. - Poll for status —
GET /v1/videos/{job_id}(no auth required). Returnsqueued,processing,completed, orfailed. - Download — When status is
completed, the response includes avideo_url.
Create Job
POST /v1/videos or /v1/videos/generations/{model}
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes* | Video model ID (e.g. kling-2.1-pro). *Not required if model is in URL path. |
prompt | string | Yes | Text description of the video (2-4096 chars) |
seconds | integer | No | Video duration in seconds (default varies by model). Supported values are model-specific. Use /v1/models to check. |
width | integer | No | Video width in pixels. Supported sizes are model-specific. Invalid sizes rejected with 400. |
height | integer | No | Video height in pixels. Supported sizes are model-specific. Invalid sizes rejected with 400. |
fps | integer | No | Frames per second. Model-specific. Not all models support custom fps. |
steps | integer | No | Diffusion steps (model-dependent) |
guidance_scale | number | No | Classifier-free guidance scale |
seed | integer | No | Deterministic seed for reproducibility |
negative_prompt | string | No | What to avoid in the generated video |
Pricing
Video pricing varies by provider and model:
- Together.ai models (Kling, MiniMax, Seedance, etc.) — flat per-video pricing. The price is the same regardless of duration or resolution.
- OpenRouter models (Veo 3.1) — per-second pricing that scales with duration and resolution. Longer videos and higher resolutions cost more.
The 402 challenge always shows the exact price for the specific parameters you requested. If no optional parameters are specified (duration, resolution), the minimum price for that model is shown.
Model Capabilities
Each video model supports specific durations, sizes, and fps values. Sending unsupported parameters returns 400 Bad Request with the list of supported values for that model. Use GET /v1/models to discover per-model capabilities including supported durations, resolutions, and fps options.
Response (202 Accepted)
{
"id": "vj_abc123...",
"status": "queued",
"model": "minimax/video-01-director",
"poll_url": "/v1/videos/vj_abc123...",
"poll_interval_ms": 5000,
"created_at": 1234567890
}
Poll Job Status
GET /v1/videos/{job_id}
Response (completed)
{
"id": "vj_abc123...",
"status": "completed",
"model": "minimax/video-01-director",
"video_url": "https://...",
"done_at": 1234567890
}
Response (failed)
{
"id": "vj_abc123...",
"status": "failed",
"model": "minimax/video-01-director",
"error": "upstream provider timeout",
"poll_interval_ms": 5000
}
Key Differences from Image Generation
- Asynchronous — returns immediately with a job ID, not a finished result
- Polling required — use
poll_urlto check status; respectpoll_interval_ms - URL-only — no
b64_jsonresponse format; videos are always returned as URLs - Longer generation times — expect 30s–5min depending on model and duration
- Non-refundable — payment is collected at job creation, not on completion
modelis required (no auto-routing)- One video per request
- Request deduplication is disabled
Example
# Create video job
curl -X POST https://llm402.ai/v1/videos \
-H "Authorization: Bearer bal_YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"model": "minimax/video-01-director", "prompt": "A cat walking through a garden", "seconds": 5}'
# Response (202 Accepted):
# {"id":"vj_abc...","status":"queued","model":"minimax/video-01-director","poll_url":"/v1/videos/vj_abc...","poll_interval_ms":5000}
# Poll for completion
curl https://llm402.ai/v1/videos/vj_abc123...
# Response (completed):
# {"id":"vj_abc...","status":"completed","model":"minimax/video-01-director","video_url":"https://...","done_at":1234567890}
Available Models
Video models from multiple providers. Use /v1/models or the Models page for the full live list with pricing and per-model capabilities. Available models include Sora 2, Veo 3.0, Veo 3.1 (via OpenRouter, per-second pricing), Kling 2.1, Seedance, PixVerse, MiniMax, Vidu, and Wan.
Code Examples
Complete examples for each payment method and language.
Uses viem for EIP-712 signing. Install: npm install viem
const API_URL = 'https://llm402.ai/v1/chat/completions';
const body = JSON.stringify({
model: 'claude-sonnet-4.6',
messages: [{ role: 'user', content: 'Say hello.' }],
max_tokens: 50
});
// 1. Get 402 -- parse Payment-Required header (x402 v2 envelope)
const res402 = await fetch(API_URL, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body });
const envelope = JSON.parse(Buffer.from(res402.headers.get('Payment-Required'), 'base64').toString());
const req = envelope.accepts[0]; // always use accepts[0]
const routedModel = res402.headers.get('X-Route-Model') || 'claude-sonnet-4.6';
// 2. Sign EIP-3009 TransferWithAuthorization (off-chain, no gas)
const signature = await walletClient.signTypedData({
domain: { name: req.extra.name, version: req.extra.version, chainId: 8453, verifyingContract: req.asset },
types: { TransferWithAuthorization: [
{ name: 'from', type: 'address' }, { name: 'to', type: 'address' },
{ name: 'value', type: 'uint256' }, { name: 'validAfter', type: 'uint256' },
{ name: 'validBefore', type: 'uint256' }, { name: 'nonce', type: 'bytes32' },
]},
primaryType: 'TransferWithAuthorization',
message: { from: address, to: req.payTo, value: BigInt(req.amount),
validAfter: BigInt(now - 600), validBefore: BigInt(now + 120), nonce },
});
// 3. Send with Payment-Signature header (base64-encoded JSON payload)
const res = await fetch(API_URL, {
method: 'POST',
headers: { 'Content-Type': 'application/json', 'Payment-Signature': paymentB64 },
body: JSON.stringify({ model: routedModel, messages, max_tokens: 50 }),
});
Uses eth-account for EIP-712 signing. Install: pip install requests eth-account
API_URL = 'https://llm402.ai/v1/chat/completions'
body = {'model': 'claude-sonnet-4.6', 'messages': [{'role': 'user', 'content': 'Say hello.'}], 'max_tokens': 50}
# 1. Get 402 -- parse Payment-Required header (x402 v2 envelope)
res402 = requests.post(API_URL, json=body)
envelope = json.loads(base64.b64decode(res402.headers["Payment-Required"]).decode())
req = envelope["accepts"][0] # always use accepts[0]
routed_model = res402.headers.get("X-Route-Model", "claude-sonnet-4.6")
# 2. Sign EIP-3009 TransferWithAuthorization (off-chain, no gas)
domain = {"name": req["extra"]["name"], "version": req["extra"]["version"],
"chainId": 8453, "verifyingContract": req["asset"]}
message = {"from": address, "to": req["payTo"], "value": int(req["amount"]),
"validAfter": now - 600, "validBefore": now + 120,
"nonce": bytes.fromhex(nonce[2:])}
signable = encode_typed_data(domain, types, "TransferWithAuthorization", message)
signed = account.sign_message(signable)
# 3. Send with Payment-Signature header (base64-encoded JSON payload)
res = requests.post(API_URL, json={**body, "model": routed_model},
headers={"Payment-Signature": payment_b64})
Requires an EIP-712 signing tool (e.g., Foundry's cast) for Step 2.
# 1. Get 402 challenge
curl -s -D- -X POST https://llm402.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"claude-sonnet-4.6","messages":[{"role":"user","content":"hi"}],"max_tokens":50}'
# Response: HTTP 402 with Payment-Required header (base64 JSON) and WWW-Authenticate (L402)
# 2. Decode the Payment-Required header (x402 v2 envelope)
echo "$PAYMENT_REQ_HEADER" | base64 -d | jq .
# Returns: { x402Version: 2, accepts: [{ scheme, network, amount, asset, payTo, extra }], resource, price }
# Use accepts[0] for payment details: jq '.accepts[0]'
# 3. Sign EIP-3009 with cast, build payload, base64 encode (see x402 docs above for full flow)
# 4. Send with payment
curl -s -X POST https://llm402.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Payment-Signature: $PAYMENT_B64" \
-d '{"model":"claude-sonnet-4.6","messages":[{"role":"user","content":"hi"}],"max_tokens":50}'
Uses ethers.js v6 with MetaMask or Coinbase Wallet. No gas for the payer -- just a signing prompt.
// 1. Connect wallet + switch to Base
const provider = new ethers.BrowserProvider(window.ethereum);
const signer = await provider.getSigner();
await window.ethereum.request({ method: 'wallet_switchEthereumChain', params: [{ chainId: '0x2105' }] });
// 2. Get 402, parse Payment-Required header (x402 v2 envelope)
const res402 = await fetch(API_URL, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body });
const envelope = JSON.parse(atob(res402.headers.get('Payment-Required')));
const req = envelope.accepts[0]; // always use accepts[0]
// 3. Sign EIP-3009 (wallet popup -- no gas, no approval tx)
const signature = await signer.signTypedData(
{ name: req.extra.name, version: req.extra.version, chainId: 8453, verifyingContract: req.asset },
{ TransferWithAuthorization: [/* from, to, value, validAfter, validBefore, nonce */] },
{ from: address, to: req.payTo, value: BigInt(req.amount), validAfter: BigInt(now-600), validBefore: BigInt(now+120), nonce }
);
// 4. Send with Payment-Signature header
const res = await fetch(API_URL, {
headers: { 'Content-Type': 'application/json', 'Payment-Signature': btoa(JSON.stringify(payload)) },
method: 'POST', body: JSON.stringify({ model: routedModel, messages, max_tokens: 50 }),
});
Pay with Bitcoin Lightning. Two-step: get invoice, pay, resend with proof.
# Step 1: Get 402 challenge with Lightning invoice
curl -s -D- -X POST https://llm402.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"deepseek-v3.2","messages":[{"role":"user","content":"hi"}],"max_tokens":50}'
# Response body includes:
# "invoice": "lnbc210n1pn..." (pay this with your Lightning wallet)
# "macaroon": "AgEJbGxt..." (send this back with the preimage)
# "price": 21 (cost in sats)
# Step 2: Pay the Lightning invoice with your wallet.
# Your wallet will give you the preimage (64-char hex).
# Step 3: Resend the request with L402 authorization
curl -s -X POST https://llm402.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: L402 AgEJbGxt...:a1b2c3d4e5f67890..." \
-d '{"model":"deepseek-v3.2","messages":[{"role":"user","content":"hi"}],"max_tokens":50}'
# Response: HTTP 200 with the chat completion
Prepay for a balance, then use it for multiple requests.
# Step 1: Create a prepaid balance (get Lightning invoice)
curl -s -X POST https://llm402.ai/v1/balance \
-H "Content-Type: application/json" \
-d '{"sats": 1000}'
# Response: { "payment_hash": "abc...", "invoice": "lnbc10u...", "sats": 1000, "expires_in": 600 }
# Step 2: Pay the invoice, then poll for the token
curl -s -X POST https://llm402.ai/v1/balance \
-H "Content-Type: application/json" \
-d '{"payment_hash": "abc..."}'
# Response: { "paid": true, "token": "bal_xxxx...", "sats": 1000, "expires_at": "..." }
# Step 3: Use the token for requests (no per-request payment needed)
curl -s -X POST https://llm402.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bal_xxxx..." \
-d '{"model":"gpt-5.4","messages":[{"role":"user","content":"hello"}],"max_tokens":100}'
# Step 4: Check remaining balance
curl -s -X POST https://llm402.ai/v1/balance \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bal_xxxx..." \
-d '{"action": "status"}'
# Response: { "sats": 850, "expires_at": "...", "total_spent": 150, "requests": 3 }
Pricing
Prices are computed dynamically per-request based on the model, your estimated input tokens, and your requested max_tokens. Cheaper models cost as little as 21 sats. The exact price is returned in the 402 response.
Denomination by Protocol
| Protocol | Unit | Minimum | Notes |
|---|---|---|---|
| x402 (USDC) | Atomic USDC (6 decimals) | ~$0.001 | amount: "3150" = $0.003150. Native USD -- no BTC conversion. |
| L402 (Lightning) | Satoshis | 21 sats | BTC/USD converted at request time. 21-sat floor for all models. |
| Cashu (ecash) | Satoshis | 21 sats | Same denomination as L402. Send sat-denominated Cashu tokens. |
| Balance (prepaid) | Satoshis | 21 sats | Funded via Lightning or USDC. Deducted per-request in sats. |
Price verification: The server recalculates the price on the paid retry and verifies the signed/paid amount covers the minimum. A rounding tolerance of 5 atomic USDC is allowed for x402.
Errors
All errors on /v1/* endpoints follow the OpenAI error format:
{
"error": {
"message": "description",
"type": "error_type",
"code": "error_code"
}
}
x402-Specific Errors
| Code | HTTP | Type | Description |
|---|---|---|---|
x402_bad_payload |
400 | invalid_request_error |
Payment-Signature header is not valid base64 or not valid JSON |
x402_underpayment |
402 | payment_error |
Signed amount is less than the model's current price |
x402_settlement_failed |
402 | payment_error |
Payment rejected (bad sig, insufficient balance, expired auth) |
ambiguous_payment |
400 | invalid_request_error |
Request has multiple payment headers (Payment-Signature, Authorization, X-Cashu). Use one, not both. |
Cashu-Specific Errors
| Code | HTTP | Type | Description |
|---|---|---|---|
cashu_no_stream |
400 | invalid_request_error |
Cashu tokens cannot be used with streaming (change requires buffered response) |
cashu_too_many_proofs |
400 | invalid_request_error |
Token contains more than 20 proofs (DoS prevention limit) |
cashu_wrong_unit |
400 | invalid_request_error |
Only sat-denominated Cashu tokens are accepted |
cashu_mint_not_allowed |
400 | invalid_request_error |
Token's mint is not in the server's allowlist |
cashu_underpayment |
402 | payment_error |
Token value is less than the model's price |
cashu_underpayment_after_fees |
402 | payment_error |
Token value is less than model's price after mint swap fees |
L402-Specific Errors
| Reason | HTTP | Description |
|---|---|---|
| Invalid macaroon signature | 401 | Macaroon was tampered with or signed with wrong key |
| Macaroon expired | 401 | ExpiresAt caveat exceeded (macaroons valid for 5 min) |
| Path mismatch | 401 | Macaroon's RequestPath does not match the endpoint called |
| max_tokens exceeds paid amount | 401 | Request max_tokens exceeds the MaxTokens caveat. Get a new invoice. |
| Input exceeds paid amount | 401 | Input size grew since invoice was issued (MaxInputChars / MaxInputTokens) |
| Invoice expired (server restarted) | 401 | NotBefore caveat fails after container restart. Request a new invoice. |
| Model mismatch | 401 | Request model does not match the macaroon's Model caveat |
| Preimage does not match | 401 | Preimage does not hash to the macaroon's payment hash |
General Errors
| Code / Reason | HTTP | Description |
|---|---|---|
| Rate limit | 429 | Per-IP rate limit exceeded. Check Retry-After header for seconds to wait. |
| Concurrent stream limit | 429 | Too many concurrent streams from your IP. |
| Context window exceeded | 400 | Input + max_tokens exceeds the model's context window |
| Invalid model | 400 | Model name not found in the model catalog |
| Service unavailable | 503 | Backend provider temporarily unreachable. Try a different model or retry later. |
x402 + concurrent streams: The server checks stream capacity before settling USDC on-chain. If you hit the concurrent stream limit (429), your payment has NOT been settled and you can safely retry.
Rate Limits
Rate limits are applied per IP address across three tiers:
| Tier | Applied When |
|---|---|
| free | Unauthenticated requests (landing page, /v1/models, /health) |
| invoice | 402 challenge requests (getting an invoice) |
| paid | Requests with valid Payment-Signature, L402 auth, balance token, or X-Cashu token |
Limits per Tier
| Tier | Limit | Window |
|---|---|---|
| free | 60 requests | 1 minute |
| invoice | 30 requests | 1 minute |
| paid | 60 requests | 1 minute |
Concurrent Stream Limits
| Scope | Limit |
|---|---|
| Per IP | 5 concurrent streams |
| Global | 250 concurrent streams |
When rate limited, the response includes a Retry-After header indicating how many seconds to wait before retrying.
Models & Auto-Routing
llm402.ai serves 400+ models across multiple providers. The full model list with pricing is available at the /v1/models endpoint:
curl -s https://llm402.ai/v1/models | jq '.data[].id'
Model Naming
You can use either short names or full provider-prefixed IDs:
| Short Name | Full ID |
|---|---|
deepseek-v3.2 | deepseek/deepseek-v3.2 |
claude-sonnet-4.6 | anthropic/claude-sonnet-4.6 |
gpt-5.4 | openai/gpt-5.4 |
Auto-Routing
Send model: "auto" and the server picks the best model for your prompt, routing across 9 task categories:
- code -- programming, debugging, code generation
- reasoning -- logic, math, step-by-step analysis
- general_knowledge -- factual questions, definitions, Q&A
- creative -- writing, storytelling, brainstorming
- summarization -- condensing content, TL;DR
- chat -- casual conversation, general chat
- multilingual -- translation, cross-language tasks
- agents -- function calling, tool integration, structured output
- vision -- image understanding (multimodal models)
To skip the classifier and route within a specific category, use the task body parameter with model: "auto":
curl -s -X POST https://llm402.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"auto","task":"code","messages":[{"role":"user","content":"Sort a list in Python"}],"max_tokens":200}'
Important: When using auto-routing, always use the X-Route-Model header from the 402 response for the paid retry. Do not send "auto" again -- the router may pick a different model with a different price.