llm402.ai API
Pay-per-request LLM inference. No accounts. No API keys. Just pay and prompt.
llm402.ai provides OpenAI-compatible endpoints gated by HTTP 402 micropayments. Send a request, get an invoice, pay it, re-send with proof of payment. Your prompt is processed by one of 400+ models across multiple providers.
Four payment rails are supported. Every 402 response includes all available options -- pick whichever works for your client:
| Protocol | Currency | Network | Header |
|---|---|---|---|
| L402 | Bitcoin (sats) | Lightning Network | WWW-Authenticate |
| x402 | USDC (stablecoin) | Base L2 (EIP-3009) | Payment-Required |
| Cashu | Bitcoin (sats) | Ecash tokens | X-Cashu |
| Balance | Bitcoin (sats) or USDC (stablecoin) | Prepaid account — fund via Lightning or x402 | Authorization: Bearer |
Model Naming
Short names work for all models -- no provider prefix needed:
deepseek-v3.2,claude-sonnet-4.6,gpt-5.4- Full IDs also work:
deepseek/deepseek-v3.2,anthropic/claude-sonnet-4.6 - For auto-routing: use
"model": "auto"and readX-Route-Modelheader from the 402 response
Model in URL Path
All inference endpoints support specifying the model in the URL path instead of the request body:
/v1/chat/completions/deepseek-v3.2/v1/images/generations/FLUX.1-schnell/v1/videos/generations/kling-2.1-master
If both URL path and body contain a model, the body model takes priority. The /v1/models endpoint returns all available model IDs.
Quick Start
x402 (USDC on Base)
Pay with USDC stablecoins. No BTC needed. No gas for the payer.
/v1/chat/completions returns 402. Server responds with a Payment-Required header (base64 JSON: price, payTo, EIP-712 domain)./v1/chat/completions with the Payment-Signature header returns 200. Server settles USDC on-chain, then returns inference.L402 (Bitcoin Lightning)
Pay with Bitcoin over the Lightning Network. Instant settlement, 21-sat minimum.
/v1/chat/completions returns 402. Server responds with a WWW-Authenticate header (macaroon + Lightning invoice)./v1/chat/completions with the Authorization: L402 header returns 200. Format: Authorization: L402 {macaroon}:{preimage}.Endpoints
All inference endpoints are OpenAI-compatible. Base URL: https://llm402.ai
OpenAI-Compatible
| Method | Path | Description | Auth |
|---|---|---|---|
| POST | /v1/chat/completions/v1/chat/completions/{model} |
Chat completions (streaming + buffered). Model can be in URL path or request body. | L402 / x402 / Balance / Cashu |
| POST | /v1/embeddings |
Text embeddings (max 128 strings per batch, no streaming) | L402 / x402 / Balance / Cashu |
| POST | /v1/images/generations/v1/images/generations/{model} |
Image generation (synchronous, one image per request). Model can be in URL path or request body. | L402 / x402 / Balance / Cashu |
| POST | /v1/videos/v1/videos/generations/{model} |
Create a video generation job (async, returns job ID). Model can be in URL path or request body. | L402 / x402 / Balance / Cashu |
| GET | /v1/videos/{job_id} |
Poll video job status (no auth required) | None |
| GET | /v1/videos/{job_id}/content |
Stream finished video MP4 through llm402 proxy (provider URL never exposed). Returned as video_url on completed poll responses. |
None |
| POST | /v1/balance |
Prepaid balance: create, top up, check status | None / Balance |
| GET | /v1/models |
List all available models (OpenAI-compatible) | None (free) |
Ollama-Compatible
| Method | Path | Description | Auth |
|---|---|---|---|
| POST | /api/generate/{model} |
Text generation | L402 / x402 / Balance / Cashu |
| POST | /api/chat/{model} |
Chat | L402 / x402 / Balance / Cashu |
| GET | /api/tags |
Model catalog with pricing | None (free) |
Ollama Examples
# Chat via Ollama-compatible endpoint (model in path)
curl -s -X POST https://llm402.ai/api/chat/deepseek-v3.2 \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"hello"}],"stream":false}'
# Text generation via Ollama-compatible endpoint
curl -s -X POST https://llm402.ai/api/generate/deepseek-v3.2 \
-H "Content-Type: application/json" \
-d '{"prompt":"Explain Bitcoin in one sentence","stream":false}'
# List all models with pricing and endpoints
curl -s https://llm402.ai/api/tags | jq '.models[] | {name, price_sats}'
Utility
| Method | Path | Description | Auth |
|---|---|---|---|
| GET | /health |
Service health and status | None (free) |
| POST | /v1/estimate-cost |
Pre-authorization cost estimation | None (free) |
| POST | /api/invoice/status |
Poll Lightning invoice payment status | None (free) |
| GET | /.well-known/l402 |
L402 service discovery (agent-readable) | None (free) |
| GET | /.well-known/openapi.json |
OpenAPI 3.1.0 specification | None (free) |
| GET | /.well-known/x402-discovery.json |
x402 v2 Bazaar discovery (resource catalog with route schemas + prices) | None (free) |
Estimate Cost
Pre-authorize requests by checking the cost before paying. This endpoint is free and requires no authentication. Useful for MCP clients, agents, and budgeting.
curl -s -X POST https://llm402.ai/v1/estimate-cost \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "Explain quantum computing"}],
"max_tokens": 500,
"pref": "balanced"
}'
Response:
{
"model": "deepseek-v3.2",
"shortName": "deepseek-v3.2",
"category": "general_knowledge",
"confidence": 0.82,
"rc": 100,
"estimatedInputTokens": 8,
"estimatedOutputTokens": 500,
"costSats": 21,
"costUsd": 0.000152,
"btcPrice": 68000
}
Parameters
| Field | Required | Description |
|---|---|---|
messages | Yes | Array of message objects (same format as chat completions) |
model | No | Model name (short or full ID). If omitted or "auto", the server auto-routes. |
max_tokens | No | Upper bound on output tokens. When omitted, the server applies a per-model default configured server-side; the global fallback is 2048 if no per-model default is set. Use POST /v1/estimate-cost to see the exact estimatedOutputTokens the server will use for your model. You are billed on this cap, not on actual consumption — set it tight for short replies, bump it up (8192, 16384, 32768+) for long-form generation. See Pricing. |
pref | No | Routing preference: quality, balanced, cost, speed |
max_cost | No | Maximum cost in sats (routes only to models within budget) |
Response fields
| Field | Description |
|---|---|
model | Resolved model ID (full upstream form, e.g. deepseek/deepseek-v3.2) |
shortName | Short model alias (e.g. deepseek-v3.2) — same form accepted in URL paths |
category | Auto-routing category the prompt classified into (e.g. code, reasoning, general_knowledge) |
confidence | Classifier confidence (0–1) for the chosen category |
rc | Routing complexity tier (10–100): higher = more capable model required for the prompt |
estimatedInputTokens | Estimated input tokens (used for billing; capped from prompt length) |
estimatedOutputTokens | Estimated output tokens (taken from max_tokens cap) |
costSats | Estimated invoice price in sats (will match the 402 challenge if sent now) |
costUsd | Same estimate in USD (informational) |
btcPrice | Current BTC price used for the conversion (refreshes per model-sync cycle) |
webSearchEnabled | Boolean — true if the request specified web_search: true and the surcharge is included |
Invoice Status
Poll the payment status of a Lightning invoice. Useful for wallet integrations that need to confirm payment before re-sending with the L402 header.
curl -s -X POST https://llm402.ai/api/invoice/status \
-H "Content-Type: application/json" \
-d '{
"payment_hash": "a1b2c3d4...64hex",
"macaroon": "AgEJ..."
}'
# Before payment: { "paid": false }
# After payment: { "paid": true, "preimage": "e5f6a7b8...64hex" }
Security: The macaroon field is required and must match the payment_hash. This prevents preimage theft by ensuring only the original invoice requester can poll for the preimage.
x402 Protocol (USDC)
x402 uses EIP-3009 TransferWithAuthorization for gasless USDC payments on Base. The payer signs an off-chain authorization; the server settles it on-chain.
Network and Asset
| Field | Value |
|---|---|
| Network | eip155:8453 (Base mainnet, chain ID 8453) |
| Asset | USDC 0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913 |
| Denomination | Atomic USDC (6 decimals: 1000000 = $1.00) |
Payment Flow
Send a normal inference request with no auth headers:
curl -s -D- -X POST https://llm402.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4.6",
"messages": [{"role": "user", "content": "Say hello."}],
"max_tokens": 50
}'
The server responds with HTTP 402. The response body contains all payment information:
{
"error": "Payment Required",
"description": "claude-sonnet-4.6 inference, pay-per-request over Lightning, USDC, or Cashu",
"price": 42,
"model": "claude-sonnet-4.6",
"provider": "llm402.ai",
"max_tokens": 50,
"estimated_input_tokens": 12,
"invoice": "lnbc420n...",
"macaroon": "AgEJ...",
"paymentHash": "a1b2c3d4e5f6...64hex",
"x402": {
"price_usd": "0.000305",
"network": "eip155:8453",
"address": "0xe05cf38...",
"asset": "0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913",
"scheme": "exact"
},
"cashu": {
"price_sats": 42,
"unit": "sat",
"description": "Send sat-denominated Cashu tokens in X-Cashu header. Server-configured mint allowlist — see /llms.txt for the current list."
}
}
The response headers include all payment options:
HTTP/2 402
WWW-Authenticate: L402 macaroon="...", invoice="lnbc..."
Payment-Required: eyJzY2hlbWUiOiJleGFjdCIsIm5ldH...
Cache-Control: no-store
Payment-Required Header
Base64-encoded JSON in x402 v2 envelope format. Decode it and use accepts[0] for payment details:
{
"x402Version": 2,
"error": "Payment required",
"accepts": [
{
"scheme": "exact",
"network": "eip155:8453",
"amount": "3150",
"asset": "0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913",
"payTo": "0x...",
"maxTimeoutSeconds": 120,
"extra": {
"name": "USD Coin",
"version": "2"
}
}
],
"resource": {
"url": "https://llm402.ai/v1/chat/completions",
"description": "LLM inference",
"mimeType": "application/json"
},
"price": "$0.003150"
}
| Field | Description |
|---|---|
x402Version | Always 2 |
accepts | Array of payment options. Always use accepts[0] |
accepts[0].scheme | Always "exact" |
accepts[0].network | Always "eip155:8453" (Base mainnet) |
accepts[0].amount | Price in atomic USDC (6 decimals). "3150" = $0.003150 |
accepts[0].asset | USDC contract address on Base |
accepts[0].payTo | Server's wallet address (recipient) |
accepts[0].maxTimeoutSeconds | Maximum settlement time (120s) |
accepts[0].extra.name | EIP-712 domain name. Always "USD Coin" (not "USDC" -- that is testnet) |
accepts[0].extra.version | EIP-712 domain version. Always "2" |
price | Human-readable USD price (informational only, use accepts[0].amount for signing) |
extensions.bazaar | Optional. x402 Bazaar discovery metadata (route schema, input/output examples). Forward unmodified in your payment payload — spec-compliant clients should pass it through. |
Build a TransferWithAuthorization signature using EIP-712 typed data.
EIP-712 Domain
const domain = {
name: "USD Coin", // from extra.name
version: "2", // from extra.version
chainId: 8453, // Base mainnet
verifyingContract: "0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913"
};
EIP-712 Types
const types = {
TransferWithAuthorization: [
{ name: "from", type: "address" },
{ name: "to", type: "address" },
{ name: "value", type: "uint256" },
{ name: "validAfter", type: "uint256" },
{ name: "validBefore", type: "uint256" },
{ name: "nonce", type: "bytes32" },
]
};
Authorization Message
const now = Math.floor(Date.now() / 1000);
const nonce = "0x" + crypto.randomBytes(32).toString("hex");
const opt = paymentRequired.accepts[0]; // always use accepts[0]
const message = {
from: walletAddress, // your address (payer)
to: opt.payTo, // from accepts[0]
value: BigInt(opt.amount),
validAfter: BigInt(now - 600), // 10 min ago (clock skew buffer)
validBefore: BigInt(now + 120), // 2 min from now
nonce: nonce,
};
Construct the V2 payment payload and base64-encode it:
const opt = paymentRequired.accepts[0]; // always use accepts[0]
const payload = {
x402Version: 2,
resource: paymentRequired.resource,
accepted: {
scheme: opt.scheme,
network: opt.network,
amount: opt.amount,
asset: opt.asset,
payTo: opt.payTo,
maxTimeoutSeconds: opt.maxTimeoutSeconds,
extra: opt.extra
},
payload: {
signature: signature,
authorization: {
from: walletAddress,
to: opt.payTo,
value: opt.amount,
validAfter: (now - 600).toString(),
validBefore: (now + 120).toString(),
nonce: nonce
}
}
};
const paymentSignature = Buffer.from(JSON.stringify(payload)).toString("base64");
Re-send the same inference request with the Payment-Signature header:
curl -s -X POST https://llm402.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Payment-Signature: eyJ4NDAyVmVyc2lvbiI6Mix..." \
-d '{
"model": "claude-sonnet-4.6",
"messages": [{"role": "user", "content": "Say hello."}],
"max_tokens": 50
}'
Response (HTTP 200):
{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1712345678,
"model": "claude-sonnet-4.6",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 4,
"completion_tokens": 10,
"total_tokens": 14
}
}
Auto-routing gotcha (x402 / Cashu): If you used model: "auto" on the 402 challenge, the server routed to a specific model and returned it in the X-Route-Model response header. On the x402 / Cashu retry with payment, you must echo that specific model back in the body — not "auto" again — because x402 / Cashu have no server-side memory of the original routing decision. (L402 differs: the macaroon binds the routed model in its Model caveat, so the L402 retry can keep "auto" in the body.)
CORS
The server allows cross-origin x402 requests:
Access-Control-Allow-Headers: Content-Type, Authorization, Payment-Signature, X-Cashu, Mcp-Session-Id
Access-Control-Expose-Headers: X-Route-Model, X-Route-Category, Payment-Required, WWW-Authenticate, X-Cashu-Change
Nonce Replay Protection
Each signed authorization can only be used once. Replay protection is enforced both server-side and on-chain via EIP-3009 nonces.
L402 Protocol (Lightning)
L402 (formerly LSAT) combines HTTP 402 status codes with Lightning Network payments and macaroon-based authentication. It is the original payment protocol supported by llm402.ai.
Payment Flow
/v1/chat/completions with no auth.402 with WWW-Authenticate: L402 macaroon="...", invoice="lnbc...".Authorization: L402 {macaroon}:{preimage}.200.Curl Example
# Send a request with no auth -- get back a 402 with an invoice
curl -s -D- -X POST https://llm402.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-v3.2",
"messages": [{"role": "user", "content": "hi"}],
"max_tokens": 50
}'
# Response includes:
# WWW-Authenticate: L402 macaroon="AgEJ...", invoice="lnbc210n1pn..."
# Body: { "error": "Payment Required", "price": 21, "invoice": "lnbc...", "macaroon": "AgEJ...", ... }
# Pay the Lightning invoice with your wallet and get the preimage.
# Then resend the exact same request with the L402 Authorization header:
curl -s -X POST https://llm402.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: L402 AgEJbGxtNDAyLmFp...:a1b2c3d4e5f67890..." \
-d '{
"model": "deepseek-v3.2",
"messages": [{"role": "user", "content": "hi"}],
"max_tokens": 50
}'
# Response: HTTP 200 with chat completion
WWW-Authenticate Header
The 402 response includes a WWW-Authenticate header with two components:
WWW-Authenticate: L402 macaroon="AgELbGxt...", invoice="lnbc50n1pn..."
| Component | Description |
|---|---|
macaroon | Base64-encoded V2 TLV macaroon with embedded caveats. Bound to a specific payment hash. |
invoice | BOLT-11 Lightning invoice. Pay this to obtain the preimage. |
Macaroon Caveats
Each macaroon is bound with first-party caveats that restrict its use. The server verifies all caveats on the paid request and rejects any that fail (fail-closed):
| Caveat | Format | Description |
|---|---|---|
RequestPath | RequestPath = /v1/chat/completions | Restricts the macaroon to a specific API endpoint |
ExpiresAt | ExpiresAt = 1712345678 | Unix timestamp expiry (5 minutes from issuance) |
MaxTokens | MaxTokens = 256 | Maximum output tokens the request may use |
MaxInputChars | MaxInputChars = 1500 | Prevents input inflation after invoice issuance |
MaxInputTokens | MaxInputTokens = 400 | Prevents token-count gaming (chars pass but tokens are higher) |
NotBefore | NotBefore = 1712340000 | Prevents preimage replay after server restart |
MaxInputItems | MaxInputItems = 5 | Binds the macaroon to the actual batch size from the request (example shows 5 items; max accepted is 128 per /v1/embeddings batch). No tolerance — retry must use the same item count. |
Model | Model = claude-sonnet-4.6 | Binds the macaroon to a specific model (prevents cross-model bypass) |
MediaType | MediaType = image | Emitted for /v1/images/generations and /v1/videos. Restricts a media macaroon to a specific media class (image or video). |
MaxUnits | MaxUnits = 1 | Number of output units the macaroon covers (e.g. images or videos). Always 1 on media endpoints. |
MaxDuration | MaxDuration = 8 | Maximum video duration in seconds (typical range 1–10). Emitted ONLY when the request specifies seconds — binds the macaroon to that duration to prevent post-invoice upsell. Default-discovery 402 challenges (no seconds) omit this caveat; the per-model duration cap from /v1/models capabilities.durations applies instead. Video only. |
MaxDimension | MaxDimension = 1920 | Maximum video longest-side pixels. Emitted ONLY when the request specifies width/height — binds the macaroon to that resolution to prevent post-invoice upsell. Default-discovery 402 challenges omit this caveat. Video only. |
WebSearch | WebSearch = true | Added to chat macaroons when the original request sent web_search: true. Binds the paid search surcharge to the flag. |
Fail-closed design: Unrecognized caveats are rejected. This ensures future caveat additions don't accidentally pass on old server versions.
Authorization Header
After paying the Lightning invoice and receiving the preimage, send the authorization:
Authorization: L402 AgELbGxt...:abc123def456...
Format: L402 {base64_macaroon}:{hex_preimage}
"model": "auto" in the body — the server extracts the routed model from the macaroon's Model caveat, so you can reuse the exact same body from 402 discovery without echoing the routed model back.The server verifies:
- The macaroon signature against the root key
- All caveats pass (path, expiry, tokens, model, etc.)
- The preimage hashes to the payment hash embedded in the macaroon identifier
- The preimage has not been used before (atomic Redis
SET NX, burned before inference begins)
Preimages are single-use and non-refundable on L402. The server claims the preimage atomically before calling the upstream model. If inference then fails (502, timeout, etc.), the preimage is already spent and cannot be retried. This is intentional — burning the preimage after inference would open a replay window where a concurrent request could reuse it. If you need automatic refund-on-failure semantics, use balance tokens or pay with Cashu instead.
Balance Tokens (Prepaid)
Balance tokens let you prepay for multiple requests with a single Lightning payment or USDC transfer. Fund a balance once, then use Authorization: Bearer bal_... on any gated endpoint without per-request payment flows.
How It Works
/v1/balance with { "sats": 1000 } returns 402 and a Lightning invoice./v1/balance with { "payment_hash": "hex64" } returns 200 and { "paid": true, "token": "bal_...", "sats": 1000 }.Authorization: Bearer bal_... on any endpoint.Endpoints
Request a Lightning invoice to fund a new balance:
# Step 1: Request an invoice for 1000 sats
curl -s -X POST https://llm402.ai/v1/balance \
-H "Content-Type: application/json" \
-d '{"sats": 1000}'
# Response (402):
# { "payment_hash": "a1b2c3...", "invoice": "lnbc10u...", "sats": 1000, "expires_in": 600 }
After paying the invoice, poll with the payment hash to get your token. (An unknown payment_hash returns 404 Unknown payment_hash; pending invoices return 200 {"paid": false}.)
# Step 2: Poll until paid
curl -s -X POST https://llm402.ai/v1/balance \
-H "Content-Type: application/json" \
-d '{"payment_hash": "a1b2c3d4e5f6...64hex"}'
# Before payment: { "paid": false }
# After payment: { "paid": true, "token": "bal_xxxx...", "sats": 1000, "expires_at": "2026-05-03T..." }
curl -s -X POST https://llm402.ai/v1/balance \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bal_xxxx..." \
-d '{"action": "status"}'
# Response:
# { "sats": 850, "expires_at": "2026-05-03T...", "total_spent": 150, "requests": 7 }
Add sats to an existing balance:
# Get a top-up invoice
curl -s -X POST https://llm402.ai/v1/balance \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bal_xxxx..." \
-d '{"sats": 500}'
# Returns 402 with a new invoice. Pay it, then poll with payment_hash as above.
Fund with USDC (x402)
Fund a balance using USDC on Base instead of Lightning. Send a single POST with a Payment-Signature header carrying an EIP-3009 TransferWithAuthorization envelope — no 402 challenge flow is required from this endpoint. The server settles the USDC on-chain via the CDP facilitator, derives the sats to credit from the signed amount at its current BTC price, and returns the balance token in one round trip.
Decide how many sats you want to buy and convert to USDC atomic units (6 decimals) at a BTC price you are willing to pay. The server will re-derive sats from the signed USDC amount at its own BTC price — your body.sats hint is advisory and never authoritative.
Sign an off-chain authorization to transfer USDC on Base from your wallet to the server’s receiving address. The signing domain, asset, network, and payTo are identical to the values served in Payment-Required envelopes on other llm402 endpoints.
// Illustrative. Uses viem. Install: npm install viem
import { createWalletClient, http, parseSignature } from 'viem';
import { base } from 'viem/chains';
import { privateKeyToAccount } from 'viem/accounts';
const account = privateKeyToAccount(process.env.PRIV_KEY);
const client = createWalletClient({ account, chain: base, transport: http() });
// USDC on Base, 6 decimals. Example: $0.50 => 500000 atomic units.
const amountAtomic = '500000';
const payTo = '0x...'; // llm402 receiving address (from any x402 envelope)
const usdc = '0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913';
// Random 32-byte nonce (hex). Must not be reused.
const nonce = '0x' + crypto.randomBytes(32).toString('hex');
const validAfter = 0n;
const validBefore = BigInt(Math.floor(Date.now() / 1000) + 120);
const signature = await client.signTypedData({
domain: { name: 'USD Coin', version: '2', chainId: 8453, verifyingContract: usdc },
types: {
TransferWithAuthorization: [
{ name: 'from', type: 'address' },
{ name: 'to', type: 'address' },
{ name: 'value', type: 'uint256' },
{ name: 'validAfter', type: 'uint256' },
{ name: 'validBefore', type: 'uint256' },
{ name: 'nonce', type: 'bytes32' },
],
},
primaryType: 'TransferWithAuthorization',
message: {
from: account.address, to: payTo, value: BigInt(amountAtomic),
validAfter, validBefore, nonce,
},
});
// x402 v2 envelope — base64(JSON) for the Payment-Signature header
const envelope = {
x402Version: 2,
scheme: 'exact',
network: 'eip155:8453',
payload: {
signature,
authorization: {
from: account.address, to: payTo, value: amountAtomic,
validAfter: validAfter.toString(), validBefore: validBefore.toString(),
nonce,
},
},
};
const paymentSignature = Buffer.from(JSON.stringify(envelope)).toString('base64');
The body may be omitted entirely, or may contain {"sats": N} as a hint for your own UX. The server ignores body.sats for accounting and derives sats from the signed USDC amount.
curl -s -X POST https://llm402.ai/v1/balance \
-H "Content-Type: application/json" \
-H "Payment-Signature: $PAYMENT_SIG" \
-d '{"sats": 500}'
# 200 OK on success:
# { "paid": true, "token": "bal_xxxx...", "sats": 500, "credited": 500 }
To top up an existing balance, add Authorization: Bearer bal_... to the same request. The server caps the top-up at 50000 sats total and returns 400 if the signed amount would exceed the cap.
Server derives sats, not client: the credited sats are computed from the signed USDC atomic amount at the server’s current BTC price, not from body.sats. If BTC moves between the moment you decide an amount and the moment the server settles, your credited sats may not match your body.sats hint. Always trust the sats field in the 200 response, not the request.
Once you have a bal_ token (from either the Lightning flow above or the USDC funding flow), include it as a Bearer auth on any gated endpoint:
curl -s -X POST https://llm402.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bal_xxxx..." \
-d '{
"model": "deepseek-v3.2",
"messages": [{"role": "user", "content": "Explain Lightning Network"}],
"max_tokens": 200
}'
Token Lifecycle
| Rule | Value |
|---|---|
| Inactivity TTL | 30 days (resets on each use) |
| Max lifetime | 90 days from creation |
| Max balance | 50,000 sats |
| Min deposit | 100 sats |
Top-ups reset the inactivity timer but do not extend the 90-day max lifetime. Plan deposits accordingly.
Cashu Tokens (Ecash)
Pay with Cashu ecash tokens -- instant, private Bitcoin micropayments with no Lightning channel required. Send tokens directly in the request header. If you overpay, the server returns change tokens.
How It Works
/v1/chat/completions (no auth) returns 402. The response body includes cashu.price_sats./v1/chat/completions with an X-Cashu header returns 200. The server swaps tokens at the mint, runs inference, and returns change if overpaid.Request
Send a cashuB (v4) token in the X-Cashu header:
curl -s -X POST https://llm402.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Cashu: cashuBo2F0gaJha..." \
-d '{
"model": "deepseek-v3.2",
"messages": [{"role": "user", "content": "hello"}],
"max_tokens": 100
}'
Response Headers
Every successful Cashu-paid call emits the same pair of response headers, and the server always lists them in Access-Control-Expose-Headers so browser clients can read them:
| Header | Value | Meaning |
|---|---|---|
X-Cashu-Consumed |
true | refunded |
Status flag — true means the server swapped the proofs at the mint and consumed payment; refunded means the request failed after swap and the full amount was returned in X-Cashu-Change. |
X-Cashu-Change |
cashuB... token |
Emitted when the presented token exceeded the price by at least 2 sats (MIN_CHANGE_SATS). Smaller overpayments are absorbed. The change token is capped at 8 KB; if the split would produce a larger header, change is absorbed. |
These headers appear on every endpoint that accepts X-Cashu payment — /v1/chat/completions, /v1/embeddings, /v1/images/generations, and /v1/videos.
HTTP/2 200
Content-Type: application/json
X-Cashu-Consumed: true
X-Cashu-Change: cashuBo2F0gaJha...
Access-Control-Expose-Headers: X-Cashu-Consumed, X-Cashu-Change
The change token is a standard cashuB token. Import it with any Cashu wallet or present it on a subsequent request.
Import change or lose it: X-Cashu-Change carries real bearer money. Wallet clients MUST read the header and import the proofs on every successful response. Discarding the header is equivalent to burning the overpayment — the server does not retain a copy.
Constraints
| Rule | Value |
|---|---|
| Token format | cashuB (v4) only. Deprecated cashuA (v3) tokens are rejected. |
| Unit | Sat-denominated only (no USD or other units) |
| Max proofs | 20 per token (DoS prevention) |
| Streaming | Not supported. Cashu requires buffered responses to calculate change. Use "stream": false. |
| Change threshold | 2 sats minimum. Overpayment of 1 sat is absorbed (not worth the mint round-trip). |
| Change size limit | 8 KB. If the change token exceeds 8 KB, it is absorbed by the server. |
| Mint | Server-configured allowlist. HTTPS-only, no private IPs. The 402 response body's cashu.description field indicates the current policy. |
No 402 dance needed: Unlike L402, you can skip the initial 402 request if you already know the price — just send the Cashu token directly and the server verifies the token value covers the model’s price. (x402 has a similar shortcut on POST /v1/balance for funding a balance token, but inference endpoints still expect either a prior 402 or a known price.)
MCP Server
llm402.ai provides a hosted Model Context Protocol (MCP) server. Connect from any MCP client — Claude Code, Claude Desktop, Cursor, or any tool that supports MCP. Six tools are available: text inference, image generation, video generation, model discovery, balance management, and funding.
Setup
bal_ token from the balance display.Claude Code (~/.claude.json):
{
"mcpServers": {
"llm402": {
"url": "https://llm402.ai/mcp",
"headers": {
"Authorization": "Bearer bal_YOUR_TOKEN_HERE"
}
}
}
}
Claude Desktop (claude_desktop_config.json) — Claude Desktop only supports stdio MCP transport, so use the third-party mcp-remote bridge:
{
"mcpServers": {
"llm402": {
"command": "npx",
"args": [
"mcp-remote",
"https://llm402.ai/mcp",
"--header",
"Authorization:Bearer bal_YOUR_TOKEN_HERE"
]
}
}
}
Replace bal_YOUR_TOKEN_HERE with your actual balance token. That’s it — the MCP client discovers all tools automatically. (mcp-remote is a community npm package; pin the version with [email protected] in production.)
Available Tools
| Tool | Auth | Description |
|---|---|---|
llm402_inference |
Required | Text inference. 400+ models, auto-routed by default. Supports system prompts, model selection, temperature, max_tokens, and routing preference (quality/balanced/cost/speed). |
llm402_image |
Required | Image generation. Requires a specific model ID (e.g. black-forest-labs/FLUX.1-schnell). Supports width, height, steps, seed, negative prompt. |
llm402_video |
Required | Video generation (async). Requires a specific model ID (e.g. wan2.7-t2v). Supports seconds, width, height, fps. Polls for completion up to 90s, then returns job URL for manual polling. |
llm402_models |
None | List available models. Optional substring filter (e.g. "deepseek", "flux"). Free, no balance required. |
llm402_balance |
Required | Check your prepaid balance: remaining sats, total deposited, total spent, request count. |
llm402_fund |
Required | Generate a Lightning invoice to top up your balance. Default 5,000 sats. Polls for payment confirmation up to 45 seconds. |
Example: Text Inference
# Using curl against the MCP endpoint directly (JSON-RPC format)
curl -X POST https://llm402.ai/mcp \
-H "Content-Type: application/json" \
-H "Accept: application/json, text/event-stream" \
-H "Authorization: Bearer bal_YOUR_TOKEN" \
-d '{
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "llm402_inference",
"arguments": {
"prompt": "Explain quantum computing in one sentence.",
"max_tokens": 100
}
},
"id": 1
}'
Example: Image Generation
curl -X POST https://llm402.ai/mcp \
-H "Content-Type: application/json" \
-H "Accept: application/json, text/event-stream" \
-H "Authorization: Bearer bal_YOUR_TOKEN" \
-d '{
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "llm402_image",
"arguments": {
"prompt": "A cyberpunk cityscape at night with neon lights",
"model": "black-forest-labs/FLUX.1-schnell"
}
},
"id": 1
}'
OpenAI-Compatible Alternative
If your tool doesn't support MCP but accepts OPENAI_BASE_URL, the same balance token works directly:
export OPENAI_BASE_URL=https://llm402.ai/v1
export OPENAI_API_KEY=bal_YOUR_TOKEN
This works with Cursor, Aider, LangChain, the OpenAI Python SDK, and any tool that accepts a custom base URL.
Endpoint
MCP endpoint: https://llm402.ai/mcp
Protocol: Streamable HTTP (POST only). Each request is independently authenticated via the Authorization: Bearer bal_ header — no server-side session state is persisted between calls. (The Mcp-Session-Id CORS header is allowed for spec-compliant clients that send it, but the server doesn’t require or track it.) Responses are Server-Sent Events containing JSON-RPC results.
OpenClaw Plugin
Use llm402.ai from inside OpenClaw via the official @llm402/openclaw-provider plugin. All four payment rails supported: prepaid balance, Cashu ecash, USDC on Base, and Lightning. 400+ models.
npm install @llm402/openclaw-provider
Requires Node.js 22+. Includes a wallet CLI and the OpenClaw plugin.
The package ships a CLI for wallet management. Skip this step if you only need balance mode (Bearer token).
# Create a wallet (generates Nostr nsec + EVM keypair)
npx llm402-openclaw init
# Fund it with Lightning (prints a BOLT11 invoice — pay from any wallet)
npx llm402-openclaw fund 5000
# Check balance
npx llm402-openclaw balance
Your sats are stored locally as Cashu ecash proofs at ~/.llm402/wallet.json. From your perspective: you pay a Lightning invoice, your inference calls deduct from the Cashu balance. Run npx llm402-openclaw --help for all commands.
Pick a payment mode:
Balance mode (simplest — Bearer token, zero latency, no wallet):
{
"paymentMode": "balance",
"balanceToken": "bal_YOUR_TOKEN_HERE"
}
Cashu mode (pay with ecash — requires wallet from step 2):
# Reveal your nsec for the config below
LLM402_SHOW_SECRETS=1 npx llm402-openclaw init
{
"paymentMode": "cashu",
"cashuNsec": "nsec1..."
}
x402 mode (pay with USDC on Base — gasless, no ETH needed):
{
"paymentMode": "x402",
"evmPrivateKey": "0x..."
}
Lightning mode (pay L402 invoices by melting Cashu proofs):
{
"paymentMode": "lightning",
"cashuNsec": "nsec1..."
}
Wallet modes (cashu / x402 / lightning) start a local HTTP proxy on 127.0.0.1 that transparently handles the 402-and-pay cycle. OpenClaw only ever sees the final 200 response. See the plugin README for all modes.
CLI commands
| Command | Description |
|---|---|
npx llm402-openclaw init | Create or load a wallet |
npx llm402-openclaw fund <sats> | Get a Lightning invoice, pay, mint Cashu proofs |
npx llm402-openclaw balance | Show Cashu balance (+ optional USDC with --check-usdc) |
npx llm402-openclaw check-funding | Resolve pending quotes from prior fund timeouts |
npx llm402-openclaw sync | Pull wallet state from Nostr relays (opt-in) |
Budget controls
Runaway cost protection. Both sats and USDC are tracked independently; either rail can reject a request before signing.
| Field | Default | Max | Rail |
|---|---|---|---|
maxRequestBudgetSats | 500 | 50,000 | sats |
sessionBudgetSats | 10,000 | 1,000,000 | sats |
sessionBudgetUsdcCents | 5,000 | 500,000 | USDC |
Security
This plugin runs locally and handles wallet keys. Do not install on shared systems, CI runners, or Codespaces. Wallet lives at ~/.llm402/wallet.json with 0600 permissions. Full threat model in the SECURITY.md.
Streaming
Add "stream": true to your request body to receive Server-Sent Events (SSE) as tokens are generated. The format follows the OpenAI streaming specification.
Request
curl -s -N -X POST https://llm402.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bal_xxxx..." \
-d '{
"model": "deepseek-v3.2",
"messages": [{"role": "user", "content": "Count to 5"}],
"max_tokens": 100,
"stream": true
}'
Response Format
The server sends a series of data: lines. Each line is a JSON chunk with a delta object containing the next token(s):
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1712345678,"model":"deepseek-v3.2","choices":[{"index":0,"delta":{"role":"assistant","content":"1"},"finish_reason":null}]}
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1712345678,"model":"deepseek-v3.2","choices":[{"index":0,"delta":{"content":", "},"finish_reason":null}]}
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1712345678,"model":"deepseek-v3.2","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":24,"total_tokens":36}}
data: [DONE]
| Field | Description |
|---|---|
choices[0].delta.role | Sent on the first chunk ("assistant"). Upstream providers may bundle the first token alongside the role in the same chunk — code defensively for both shapes. |
choices[0].delta.content | The next token(s) of the response |
choices[0].finish_reason | null while generating, "stop" on the final chunk |
usage | Optional. Some providers attach a token-count summary to the final chunk (the one with finish_reason). Treat as best-effort. |
data: [DONE] | End-of-stream marker. Close the connection after this line. |
Heartbeat: During long-running inferences, the server sends : heartbeat SSE comments every 15 seconds to keep the connection alive. These are not data lines and should be ignored by your parser.
Consuming the Stream
# Stream tokens to stdout (use -N to disable buffering)
curl -s -N -X POST https://llm402.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bal_xxxx..." \
-d '{"model":"deepseek-v3.2","messages":[{"role":"user","content":"hello"}],"max_tokens":100,"stream":true}' \
| while IFS= read -r line; do
echo "$line"
done
const res = await fetch('https://llm402.ai/v1/chat/completions', {
method: 'POST',
headers: { 'Content-Type': 'application/json', 'Authorization': 'Bearer bal_xxxx...' },
body: JSON.stringify({
model: 'deepseek-v3.2', messages: [{ role: 'user', content: 'hello' }],
max_tokens: 100, stream: true
})
});
const reader = res.body.getReader();
const decoder = new TextDecoder();
let buf = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
buf += decoder.decode(value, { stream: true });
const lines = buf.split('\n');
buf = lines.pop() || ''; // keep partial trailing line for the next read
for (const line of lines) {
if (line.startsWith('data: ') && line !== 'data: [DONE]') {
const chunk = JSON.parse(line.slice(6));
const token = chunk.choices[0]?.delta?.content || '';
process.stdout.write(token);
}
}
}
import requests, json
res = requests.post('https://llm402.ai/v1/chat/completions',
headers={'Content-Type': 'application/json', 'Authorization': 'Bearer bal_xxxx...'},
json={'model': 'deepseek-v3.2', 'messages': [{'role': 'user', 'content': 'hello'}],
'max_tokens': 100, 'stream': True},
stream=True)
for line in res.iter_lines():
line = line.decode('utf-8')
if line.startswith('data: ') and line != 'data: [DONE]':
chunk = json.loads(line[6:])
token = chunk['choices'][0].get('delta', {}).get('content', '')
print(token, end='', flush=True)
Payment first: Streaming requires the same payment flow as buffered requests. Pay via L402, x402, or Balance token before sending a stream request. You cannot begin streaming before payment is verified. Cashu does not support streaming -- use buffered mode ("stream": false) with Cashu tokens.
Request Deduplication
Non-streaming responses are cached for 30 seconds. If you retry an identical request (same model, messages, max_tokens, and IP), the server returns the cached response immediately without re-running inference or re-charging you.
| Parameter | Value |
|---|---|
| TTL | 30 seconds |
| Max entries | 100 |
| Max entry size | 1 MB |
| Scope | Per-IP (different IPs get separate caches) |
The response includes an X-Dedup header indicating whether the response was served from cache:
X-Dedup: hit # served from cache (no charge)
X-Dedup: miss # fresh inference
To bypass the cache, send the X-No-Cache: true request header. (Streaming responses and cache-bypassed requests omit the X-Dedup header entirely — the absence of the header should be treated the same as miss.)
Web Search
Ground model responses with real-time web data. When enabled, the model searches the web before responding and includes citations to sources. Available on most models via auto-routing.
Usage
Add web_search: true to any /v1/chat/completions request:
curl -X POST https://llm402.ai/v1/chat/completions \
-H "Authorization: Bearer bal_YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [{"role": "user", "content": "What is the current price of Bitcoin?"}],
"web_search": true,
"stream": true
}'
Parameters
| Parameter | Type | Description |
|---|---|---|
web_search | boolean | Set to true to enable web search. Default: false. |
Response
The model embeds citation markers (e.g. [1]) in its response text with links to sources. When the upstream provider returns structured citations, llm402 forwards them unmodified as an annotations array attached to the assistant message.
Non-streaming schema
The annotations array is attached to choices[0].message:
{
"choices": [{
"message": {
"role": "assistant",
"content": "Some answer with citations [1].",
"annotations": [
{
"type": "url_citation",
"url_citation": {
"url": "https://example.com/article",
"title": "Example article",
"content": "optional snippet",
"start_index": 14,
"end_index": 17
}
}
]
}
}]
}
Streaming
During an SSE stream with web_search: true, annotations do not arrive progressively. They are attached to the final stop chunk (the same chunk that carries finish_reason: "stop"), as a top-level annotations array on that chunk.
data: {"id":"...","choices":[{"index":0,"delta":{"content":"answer"}}]}
data: {"id":"...","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"annotations":[{"type":"url_citation","url_citation":{"url":"https://example.com","title":"..."}}]}
data: [DONE]
Upstream pass-through: This schema follows the OpenAI / OpenRouter annotations.url_citation convention. llm402 forwards whatever the upstream provider returns without reshaping it. If the model or upstream changes the format, this shape may shift. Treat the schema as a best-effort surface and code defensively.
Pricing
Web search adds a small surcharge per request on top of the base inference cost. The surcharge varies with model and current rates — query POST /v1/estimate-cost with web_search: true for the exact number, or read the price field on the 402 challenge. Default coverage is up to 5 web page lookups per request.
Compatibility
model: "auto"— always supported. The router selects a search-capable model.- Explicit model — supported on most models. Returns
400if the model does not support web search. - The
toolsparameter is not supported. Useweb_search: trueinstead.
Estimate Cost
Use /v1/estimate-cost with web_search: true to see the price before sending a paid request:
curl -X POST https://llm402.ai/v1/estimate-cost \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [{"role": "user", "content": "Latest AI news"}],
"web_search": true
}'
The response includes webSearchEnabled: true and the total cost with the search surcharge included.
Image Generation
Generate images from text prompts. 40+ models across multiple providers, all behind a unified OpenAI-compatible endpoint. All four payment rails are supported (L402, x402, Balance, Cashu).
Endpoint
POST /v1/images/generations or /v1/images/generations/{model}
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes* | Image model ID (e.g. FLUX.1-schnell). *Not required if model is in URL path. |
prompt | string | Yes | Text description of the image (2-4096 chars) |
size | string | No | Dimensions as "WxH" string (e.g. "1024x1792"). Use "auto" for model default. Overrides width/height. |
width | integer | No | Image width in pixels (64–2048). Must provide both width and height together. |
height | integer | No | Image height in pixels (64–2048). Must provide both width and height together. |
steps | integer | No | Diffusion steps (1-50, default model-dependent) |
response_format | string | No | url (default) or b64_json |
seed | integer | No | Deterministic seed for reproducibility |
Response
{
"created": 1234567890,
"model": "black-forest-labs/FLUX.1-schnell",
"data": [
{ "url": "/v1/media/img_abc123..." }
]
}
The url field can take three forms:
- Relative proxy path (e.g.
/v1/media/img_abc123...) — the most common form. The image is served from llm402.ai with a 24h TTL and provider-agnostic CSP. Prependhttps://llm402.aito fetch. data:URI (e.g.data:image/png;base64,...) — inline base64 for providers that return raw bytes (the image is embedded; no extra fetch).- Direct HTTPS URL — only as a fallback when the media-proxy token can't be created. HTTPS URLs from upstream providers may expire (~7 days).
Each data[i] entry may also include provider-specific extras: revised_prompt (image-prompt rewriting), or timings/index (provider diagnostics). These are pass-through — treat them as best-effort and code defensively.
Key Differences from Chat Completions
- No streaming — response is synchronous
modelis required (no auto-routing)- One image per request (
nis always 1) - Pricing is per-image, not per-token
- All image payments are non-refundable on backend failure — intentional per pentester finding (eliminates refund oracle). This diverges from chat where Balance refunds on upstream failure. Use
/v1/estimate-cost+ the/v1/modelshealth signal before paying. - Request deduplication is disabled
- Generation time varies: 1–10s for FLUX/diffusion models, 30–90s for GPT-5 Image models
- Dimensions are automatically rounded to the nearest multiple of 16 for compatibility
- Both
sizestring andwidth/heightinteger formats are accepted
Example
curl -X POST https://llm402.ai/v1/images/generations \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bal_your_token_here" \
-d '{
"model": "black-forest-labs/FLUX.1-schnell",
"prompt": "A serene mountain landscape at sunset"
}'
Available Models
40+ image models from multiple providers. Use /v1/models or the Models page for the full live list with current sat prices. Selected highlights:
| Model | Price | Notes |
|---|---|---|
flux.1-schnell | varies | Fast, cheapest FLUX |
flux.2-pro | varies | Professional quality |
flux.2-max | varies | Maximum quality |
flash-image-2.5 | varies | Nano Banana — Gemini image gen |
flash-image-3.1 | varies | Nano Banana 2 — latest Gemini |
imagen-4.0-fast | varies | Google Imagen |
gpt-5-image-mini | varies | GPT-5 image gen (compact) |
gpt-5-image | varies | GPT-5 image gen (full, slow ~60s) |
fibo | varies | JSON-native, enterprise-safe |
ideogram/ideogram-3.0 | 126 sats | Strong text rendering |
Video Generation
Generate videos from text prompts. Unlike image generation, video generation is asynchronous: you create a job, then poll for completion. All four payment rails are supported (L402, x402, Balance, Cashu). Payment is collected when the job is created.
Workflow
- Create job —
POST /v1/videoswith your prompt and model. Returns202 Acceptedwith a job ID and poll URL. - Poll for status —
GET /v1/videos/{job_id}(no auth required). Returnsqueued,processing,completed, orfailed. - Download — When status is
completed, the response includes avideo_url.
Create Job
POST /v1/videos or /v1/videos/generations/{model}
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes* | Video model ID (e.g. kling-2.1-master). *Not required if model is in URL path. |
prompt | string | Yes | Text description of the video (2-4096 chars) |
seconds | integer | No | Video duration in seconds. Valid values depend on the model (check /v1/models for each model’s capabilities.durations). If omitted, the model’s default is used. |
width | integer | No | Video width in pixels. Must be paired with height. Valid values depend on the model (check /v1/models for each model’s capabilities.sizes). If omitted, the model’s default is used. |
height | integer | No | Video height in pixels. Must be paired with width. Valid values depend on the model (check /v1/models for each model’s capabilities.sizes). If omitted, the model’s default is used. |
fps | integer | No | Frames per second (1–60). Only some models support this. Check capabilities in the /v1/models response. |
steps | integer | No | Diffusion steps (model-dependent) |
guidance_scale | number | No | Classifier-free guidance scale |
seed | integer | No | Deterministic seed for reproducibility |
negative_prompt | string | No | What to avoid in the generated video |
Pricing
Video pricing varies by provider and model:
- Together.ai models (Kling, MiniMax, Seedance, etc.) — flat per-video pricing. The price is the same regardless of duration or resolution.
- OpenRouter models (Veo 3.1) — per-second pricing that scales with duration and resolution. Longer videos and higher resolutions cost more.
The 402 challenge always shows the exact price for the specific parameters you requested. If no optional parameters are specified (duration, resolution), the minimum price for that model is shown.
Model Capabilities
Each video model supports specific durations, sizes, and fps values. Sending unsupported parameters returns 400 Bad Request with the list of supported values for that model. Use GET /v1/models to discover per-model capabilities.
Video models in the /v1/models response include these additional fields:
| Field | Type | Description |
|---|---|---|
model_type | string | "video" — identifies this as a video generation model |
capabilities.durations | array | null | Supported duration values in seconds (e.g. [5, 10]), or null if unconstrained |
capabilities.sizes | array | null | Supported WxH dimension strings (e.g. ["1920x1080", "1280x720"]), or null if unconstrained |
# Discover video model capabilities
curl -s https://llm402.ai/v1/models | jq '.data[] | select(.model_type=="video") | {id, capabilities}'
Response (202 Accepted)
{
"id": "vj_abc123...",
"status": "queued",
"model": "minimax/video-01-director",
"poll_url": "/v1/videos/vj_abc123...",
"poll_interval_ms": 5000,
"created_at": 1234567890
}
Poll Job Status
GET /v1/videos/{job_id}
Response (completed)
{
"id": "vj_abc123...",
"status": "completed",
"model": "minimax/video-01-director",
"video_url": "/v1/videos/vj_abc123.../content",
"done_at": 1234567890,
"created_at": 1234567000,
"poll_interval_ms": 5000
}
Response (failed)
{
"id": "vj_abc123...",
"status": "failed",
"model": "minimax/video-01-director",
"error": "upstream provider timeout",
"poll_interval_ms": 5000
}
Key Differences from Image Generation
- Asynchronous — returns immediately with a job ID, not a finished result
- Polling required — use
poll_urlto check status; respectpoll_interval_ms - URL-only — no
b64_jsonresponse format; videos are always returned as URLs - Longer generation times — expect 30s–5min depending on model and duration
- Non-refundable — payment is collected at job creation, not on completion
modelis required (no auto-routing)- One video per request
- Request deduplication is disabled
Example
# Create video job
curl -X POST https://llm402.ai/v1/videos \
-H "Authorization: Bearer bal_YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"model": "minimax/video-01-director", "prompt": "A cat walking through a garden", "seconds": 5}'
# Response (202 Accepted):
# {"id":"vj_abc...","status":"queued","model":"minimax/video-01-director","poll_url":"/v1/videos/vj_abc...","poll_interval_ms":5000}
# Poll for completion
curl https://llm402.ai/v1/videos/vj_abc123...
# Response (completed):
# {"id":"vj_abc...","status":"completed","model":"minimax/video-01-director","video_url":"/v1/videos/vj_abc.../content","done_at":1234567890,"created_at":1234567000,"poll_interval_ms":5000}
# Generate a 16:9 HD video with specific duration
curl -X POST https://llm402.ai/v1/videos \
-H "Authorization: Bearer bal_YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"model": "google/veo-3.0", "prompt": "cat walking on beach", "seconds": 8, "width": 1920, "height": 1080}'
Available Models
Video models from multiple providers. Use /v1/models or the Models page for the full live list with pricing and per-model capabilities. Available models include Sora 2, Veo 3.0, Veo 3.1 (via OpenRouter, per-second pricing), Kling 2.1, Seedance, PixVerse, MiniMax, Vidu, and Wan.
Fetching Generated Video
GET /v1/videos/{job_id}/content
Completed video jobs expose their binary through an opaque proxy endpoint. When the poll response of GET /v1/videos/{job_id} returns status: "completed", the accompanying video_url field is a relative path of the form /v1/videos/vj_…/content pointing at this endpoint. The provider’s actual CDN URL is never exposed to the client.
No additional authentication is required — the job_id itself is a 128-bit capability token (vj_ + 32 hex characters).
End-to-end flow
poll_url.GET /v1/videos/{job_id} at poll_interval_ms until status === "completed".video_url path to stream the video bytes.curl -s -L -o out.mp4 "https://llm402.ai/v1/videos/vj_aaaaaaaabbbbbbbbccccccccdddddddd/content"
The placeholder job ID above will return 404 Video not available — that is expected, and documents the shape of the "unknown or expired job" error.
Response
On success the server streams the binary with the following headers:
| Header | Value |
|---|---|
Content-Type | One of video/mp4, video/webm, video/quicktime, video/x-msvideo. Anything outside this allowlist is coerced to video/mp4. |
Content-Length | Forwarded from the upstream provider when present. Responses larger than the server’s body cap are rejected with 502. |
Cache-Control | private, max-age=3600 |
Content-Security-Policy | default-src 'none'; sandbox (prevents script execution in proxied content) |
Status codes
| Status | Meaning |
|---|---|
200 | Video bytes streaming. |
404 | Unknown job, not yet completed, or the server no longer has a videoUrl for it (expired). |
403 | Upstream video URL host is not on the proxy allowlist, or the supplied job ID is malformed (must match vj_ + 32 hex chars). |
502 | Upstream download failed or body exceeds size cap. |
503 | Concurrent video-proxy capacity reached; retry with exponential backoff. A Retry-After: 10 header is returned. |
504 | Upstream download timed out (60s). |
Rate limit: this endpoint shares the video-polling class at 60 requests / minute per IP.
Provider URLs are never exposed: all completed video content is served via /v1/videos/{job_id}/content. Clients never see the upstream CDN or provider URL, and cannot reach the provider directly.
Handle 503 with backoff: in-flight content requests may be rejected with 503 Video proxy busy when the server reaches its concurrent-proxy cap. Clients MUST implement exponential backoff and retry; do not tight-loop on 503, or you will trip the 60/minute rate limit and receive 429.
Code Examples
Complete examples for each payment method and language.
Uses viem for EIP-712 signing. Install: npm install viem
const API_URL = 'https://llm402.ai/v1/chat/completions';
const body = JSON.stringify({
model: 'claude-sonnet-4.6',
messages: [{ role: 'user', content: 'Say hello.' }],
max_tokens: 50
});
// 1. Get 402 -- parse Payment-Required header (x402 v2 envelope)
const res402 = await fetch(API_URL, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body });
const envelope = JSON.parse(Buffer.from(res402.headers.get('Payment-Required'), 'base64').toString());
const req = envelope.accepts[0]; // always use accepts[0]
const routedModel = res402.headers.get('X-Route-Model') || 'claude-sonnet-4.6';
// 2. Sign EIP-3009 TransferWithAuthorization (off-chain, no gas)
const signature = await walletClient.signTypedData({
domain: { name: req.extra.name, version: req.extra.version, chainId: 8453, verifyingContract: req.asset },
types: { TransferWithAuthorization: [
{ name: 'from', type: 'address' }, { name: 'to', type: 'address' },
{ name: 'value', type: 'uint256' }, { name: 'validAfter', type: 'uint256' },
{ name: 'validBefore', type: 'uint256' }, { name: 'nonce', type: 'bytes32' },
]},
primaryType: 'TransferWithAuthorization',
message: { from: address, to: req.payTo, value: BigInt(req.amount),
validAfter: BigInt(now - 600), validBefore: BigInt(now + 120), nonce },
});
// 3. Send with Payment-Signature header (base64-encoded JSON payload)
const res = await fetch(API_URL, {
method: 'POST',
headers: { 'Content-Type': 'application/json', 'Payment-Signature': paymentB64 },
body: JSON.stringify({ model: routedModel, messages, max_tokens: 50 }),
});
Uses eth-account for EIP-712 signing. Install: pip install requests eth-account
API_URL = 'https://llm402.ai/v1/chat/completions'
body = {'model': 'claude-sonnet-4.6', 'messages': [{'role': 'user', 'content': 'Say hello.'}], 'max_tokens': 50}
# 1. Get 402 -- parse Payment-Required header (x402 v2 envelope)
res402 = requests.post(API_URL, json=body)
envelope = json.loads(base64.b64decode(res402.headers["Payment-Required"]).decode())
req = envelope["accepts"][0] # always use accepts[0]
routed_model = res402.headers.get("X-Route-Model", "claude-sonnet-4.6")
# 2. Sign EIP-3009 TransferWithAuthorization (off-chain, no gas)
domain = {"name": req["extra"]["name"], "version": req["extra"]["version"],
"chainId": 8453, "verifyingContract": req["asset"]}
message = {"from": address, "to": req["payTo"], "value": int(req["amount"]),
"validAfter": now - 600, "validBefore": now + 120,
"nonce": bytes.fromhex(nonce[2:])}
signable = encode_typed_data(domain, types, "TransferWithAuthorization", message)
signed = account.sign_message(signable)
# 3. Send with Payment-Signature header (base64-encoded JSON payload)
res = requests.post(API_URL, json={**body, "model": routed_model},
headers={"Payment-Signature": payment_b64})
Requires an EIP-712 signing tool (e.g., Foundry's cast) for Step 2.
# 1. Get 402 challenge
curl -s -D- -X POST https://llm402.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"claude-sonnet-4.6","messages":[{"role":"user","content":"hi"}],"max_tokens":50}'
# Response: HTTP 402 with Payment-Required header (base64 JSON) and WWW-Authenticate (L402)
# 2. Decode the Payment-Required header (x402 v2 envelope)
echo "$PAYMENT_REQ_HEADER" | base64 -d | jq .
# Returns: { x402Version: 2, accepts: [{ scheme, network, amount, asset, payTo, extra }], resource, price }
# Use accepts[0] for payment details: jq '.accepts[0]'
# 3. Sign EIP-3009 with cast, build payload, base64 encode (see x402 docs above for full flow)
# 4. Send with payment
curl -s -X POST https://llm402.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Payment-Signature: $PAYMENT_B64" \
-d '{"model":"claude-sonnet-4.6","messages":[{"role":"user","content":"hi"}],"max_tokens":50}'
Uses ethers.js v6 with MetaMask or Coinbase Wallet. No gas for the payer -- just a signing prompt.
// 1. Connect wallet + switch to Base
const provider = new ethers.BrowserProvider(window.ethereum);
const signer = await provider.getSigner();
await window.ethereum.request({ method: 'wallet_switchEthereumChain', params: [{ chainId: '0x2105' }] });
// 2. Get 402, parse Payment-Required header (x402 v2 envelope)
const res402 = await fetch(API_URL, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body });
const envelope = JSON.parse(atob(res402.headers.get('Payment-Required')));
const req = envelope.accepts[0]; // always use accepts[0]
// 3. Sign EIP-3009 (wallet popup -- no gas, no approval tx)
const signature = await signer.signTypedData(
{ name: req.extra.name, version: req.extra.version, chainId: 8453, verifyingContract: req.asset },
{ TransferWithAuthorization: [/* from, to, value, validAfter, validBefore, nonce */] },
{ from: address, to: req.payTo, value: BigInt(req.amount), validAfter: BigInt(now-600), validBefore: BigInt(now+120), nonce }
);
// 4. Send with Payment-Signature header
const res = await fetch(API_URL, {
headers: { 'Content-Type': 'application/json', 'Payment-Signature': btoa(JSON.stringify(payload)) },
method: 'POST', body: JSON.stringify({ model: routedModel, messages, max_tokens: 50 }),
});
Pay with Bitcoin Lightning. Two-step: get invoice, pay, resend with proof.
# Step 1: Get 402 challenge with Lightning invoice
curl -s -D- -X POST https://llm402.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"deepseek-v3.2","messages":[{"role":"user","content":"hi"}],"max_tokens":50}'
# Response body includes:
# "invoice": "lnbc210n1pn..." (pay this with your Lightning wallet)
# "macaroon": "AgEJbGxt..." (send this back with the preimage)
# "price": 21 (cost in sats)
# Step 2: Pay the Lightning invoice with your wallet.
# Your wallet will give you the preimage (64-char hex).
# Step 3: Resend the request with L402 authorization
curl -s -X POST https://llm402.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: L402 AgEJbGxt...:a1b2c3d4e5f67890..." \
-d '{"model":"deepseek-v3.2","messages":[{"role":"user","content":"hi"}],"max_tokens":50}'
# Response: HTTP 200 with the chat completion
Prepay for a balance, then use it for multiple requests.
# Step 1: Create a prepaid balance (get Lightning invoice)
curl -s -X POST https://llm402.ai/v1/balance \
-H "Content-Type: application/json" \
-d '{"sats": 1000}'
# Response: { "payment_hash": "abc...", "invoice": "lnbc10u...", "sats": 1000, "expires_in": 600 }
# Step 2: Pay the invoice, then poll for the token
curl -s -X POST https://llm402.ai/v1/balance \
-H "Content-Type: application/json" \
-d '{"payment_hash": "abc..."}'
# Response: { "paid": true, "token": "bal_xxxx...", "sats": 1000, "expires_at": "..." }
# Step 3: Use the token for requests (no per-request payment needed)
curl -s -X POST https://llm402.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bal_xxxx..." \
-d '{"model":"gpt-5.4","messages":[{"role":"user","content":"hello"}],"max_tokens":100}'
# Step 4: Check remaining balance
curl -s -X POST https://llm402.ai/v1/balance \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bal_xxxx..." \
-d '{"action": "status"}'
# Response: { "sats": 850, "expires_at": "...", "total_spent": 150, "requests": 3 }
Pricing
Prices are computed dynamically per-request based on the model, your estimated input tokens, and your requested max_tokens. Cheaper models cost as little as 21 sats. The exact price is returned in the 402 response.
A uniform 10% markup is applied over upstream provider cost across every modality — chat, embeddings, images, video, and web search. Same markup regardless of which payment rail you use (L402, x402, Cashu, or prepaid balance). The 21-sat floor applies to all sats-denominated requests.
How max_tokens affects your bill
We charge upfront, once per request, based on input size plus your max_tokens cap — not on actual tokens returned. A 50-word answer at max_tokens: 16384 costs the same as a 5,000-word answer at the same cap. Sizing the cap tightly is the single biggest lever on your invoice.
Default: If you omit max_tokens, the server uses 2048 — enough for ~1,500 words or ~180 lines of code. Fits the vast majority of chat responses.
Guidance for larger outputs:
| Use case | Recommended max_tokens | Rough output size |
|---|---|---|
| Short chat / factual answer | 256 – 512 | 1–2 paragraphs |
| Standard reply, default | 2048 (omitted) | ~1,500 words / ~180 LOC |
| Long-form explanation, multi-step reasoning | 4096 – 8192 | ~3,000–6,000 words |
| Essays, full blog posts, long code files | 16384 | ~12,000 words |
| Book chapters, very long generation | 32768+ | ~24,000+ words |
Per-model upper bounds apply — check max_tokens in /v1/models (which returns each model's context_length along with per-token pricing in USD and sats). Requests that exceed a model's context window are rejected before payment with a 400, so you never pay for an impossible request.
Denomination by Protocol
| Protocol | Unit | Minimum | Notes |
|---|---|---|---|
| x402 (USDC) | Atomic USDC (6 decimals) | ~$0.001 | amount: "3150" = $0.003150. Native USD -- no BTC conversion. |
| L402 (Lightning) | Satoshis | 21 sats | BTC/USD converted at request time. 21-sat floor for all models. |
| Cashu (ecash) | Satoshis | 21 sats | Same denomination as L402. Send sat-denominated Cashu tokens. |
| Balance (prepaid) | Satoshis | 21 sats | Funded via Lightning or USDC. Deducted per-request in sats. |
Price verification: The server recalculates the price on the paid retry and verifies the signed/paid amount covers the minimum. A rounding tolerance of 5 atomic USDC is allowed for x402.
Errors
All errors on /v1/* endpoints follow the OpenAI error format:
{
"error": {
"message": "description",
"type": "error_type",
"code": "error_code"
}
}
x402-Specific Errors
| Code | HTTP | Type | Description |
|---|---|---|---|
x402_bad_payload |
400 | invalid_request_error |
Payment-Signature header is not valid base64 or not valid JSON |
x402_underpayment |
402 | payment_error |
Signed amount is less than the model's current price |
x402_settlement_failed |
402 | payment_error |
Payment rejected (bad sig, insufficient balance, expired auth) |
ambiguous_payment |
400 | invalid_request_error |
Request has multiple payment headers (Payment-Signature, Authorization, X-Cashu). Use one, not both. |
Cashu-Specific Errors
| Code | HTTP | Type | Description |
|---|---|---|---|
cashu_no_stream |
400 | invalid_request_error |
Cashu tokens cannot be used with streaming (change requires buffered response) |
cashu_too_many_proofs |
400 | invalid_request_error |
Token contains more than 20 proofs (DoS prevention limit) |
cashu_wrong_unit |
400 | invalid_request_error |
Only sat-denominated Cashu tokens are accepted |
cashu_mint_not_allowed |
400 | invalid_request_error |
Token's mint is not in the server's allowlist |
cashu_underpayment |
402 | payment_error |
Token value is less than the model's price |
cashu_underpayment_after_fees |
402 | payment_error |
Token value is less than model's price after mint swap fees |
L402-Specific Errors
| Reason | HTTP | Description |
|---|---|---|
| Invalid macaroon signature | 401 | Macaroon was tampered with or signed with wrong key |
| Macaroon expired | 401 | ExpiresAt caveat exceeded (macaroons valid for 5 min) |
| Path mismatch | 401 | Macaroon's RequestPath does not match the endpoint called |
| max_tokens exceeds paid amount | 401 | Request max_tokens exceeds the MaxTokens caveat. Get a new invoice. |
| Input exceeds paid amount | 401 | Input size grew since invoice was issued (MaxInputChars / MaxInputTokens) |
| Invoice expired (server restarted) | 401 | NotBefore caveat fails after container restart. Request a new invoice. |
| Model mismatch | 401 | Request model does not match the macaroon's Model caveat |
| Preimage does not match | 401 | Preimage does not hash to the macaroon's payment hash |
General Errors
| Code / Reason | HTTP | Description |
|---|---|---|
| Rate limit | 429 | Per-IP rate limit exceeded. Check Retry-After header for seconds to wait. |
| Concurrent stream limit | 429 | Too many concurrent streams from your IP. |
| Context window exceeded | 400 | Input + max_tokens exceeds the model's context window |
| Invalid model | 400 | Model name not found in the model catalog |
| Service unavailable | 503 | Backend provider temporarily unreachable. Try a different model or retry later. |
x402 + concurrent streams: The server checks stream capacity before settling USDC on-chain. If you hit the concurrent stream limit (429), your payment has NOT been settled and you can safely retry.
Rate Limits
Rate limits apply per IP address (via cf-connecting-ip). Limits differ by endpoint class.
| Endpoint class | Example paths | Limit |
|---|---|---|
| Free endpoints | /health, /v1/models, /v1/estimate-cost, /api/tags, /.well-known/* |
60 requests / minute |
| Invoice requests (402 challenge) | POST /v1/balance (create / top-up) |
30 requests / minute |
| Polling endpoints | POST /api/invoice/status |
60 requests / minute (free tier — supports ~5s polling cadence) |
| Authenticated inference | POST /v1/chat/completions, POST /v1/embeddings, POST /api/chat/*, POST /api/generate/* |
60 requests / minute |
| Media generation | POST /v1/images/generations, POST /v1/videos |
10 requests / minute |
| Video job polling | GET /v1/videos/{id}, GET /v1/videos/{id}/content |
60 requests / minute |
Concurrent Stream Limits
| Scope | Limit |
|---|---|
| Per IP | 5 concurrent streams |
| Global | 250 concurrent streams |
When rate limited, the response includes a Retry-After header indicating how many seconds to wait before retrying.
Models & Auto-Routing
llm402.ai serves 400+ models across multiple providers. The full model list with pricing is available at the /v1/models endpoint:
curl -s https://llm402.ai/v1/models | jq '.data[].id'
Model Naming
You can use either short names or full provider-prefixed IDs:
| Short Name | Full ID |
|---|---|
deepseek-v3.2 | deepseek/deepseek-v3.2 |
claude-sonnet-4.6 | anthropic/claude-sonnet-4.6 |
gpt-5.4 | openai/gpt-5.4 |
Auto-Routing
Send model: "auto" and an embedding-based classifier picks the best model across 8 task categories: code, reasoning, creative, summarization, multilingual, general_knowledge, agents, chat. Vision is supported via explicit "task": "vision" in the request body (the auto-classifier itself doesn't infer vision — provide images and set the task hint).
- code -- programming, debugging, code generation
- reasoning -- logic, math, step-by-step analysis
- general_knowledge -- factual questions, definitions, Q&A
- creative -- writing, storytelling, brainstorming
- summarization -- condensing content, TL;DR
- chat -- casual conversation, general chat
- multilingual -- translation, cross-language tasks
- agents -- function calling, tool integration, structured output
- vision -- image understanding (multimodal models)
To skip the classifier and route within a specific category, use the task body parameter with model: "auto":
curl -s -X POST https://llm402.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"auto","task":"code","messages":[{"role":"user","content":"Sort a list in Python"}],"max_tokens":200}'
Important (x402 / Cashu): Echo the X-Route-Model from the 402 response into the body on the paid retry. Don't re-send "auto" — the router could pick a different model at a different price. L402 retries can keep "auto" because the macaroon's Model caveat is the binding authority.