llm402.ai API

Pay-per-request LLM inference. No accounts. No API keys. Just pay and prompt.

llm402.ai provides OpenAI-compatible endpoints gated by HTTP 402 micropayments. Send a request, get an invoice, pay it, re-send with proof of payment. Your prompt is processed by one of 400+ models across multiple providers.

Four payment rails are supported. Every 402 response includes all available options -- pick whichever works for your client:

Protocol	Currency	Network	Header
L402	Bitcoin (sats)	Lightning Network	`WWW-Authenticate`
x402	USDC (stablecoin)	Base L2 (EIP-3009)	`Payment-Required`
Cashu	Bitcoin (sats)	Ecash tokens	`X-Cashu`
Balance	Bitcoin (sats) or USDC (stablecoin)	Prepaid account — fund via Lightning or x402	`Authorization: Bearer`

Model Naming

Short names work for all models -- no provider prefix needed:

deepseek-v3.2, claude-sonnet-4.6, gpt-5.4
Full IDs also work: deepseek/deepseek-v3.2, anthropic/claude-sonnet-4.6
For auto-routing: use "model": "auto" and read X-Route-Model header from the 402 response

Model in URL Path

All inference endpoints support specifying the model in the URL path instead of the request body:

/v1/chat/completions/deepseek-v3.2
/v1/images/generations/FLUX.1-schnell
/v1/videos/generations/kling-2.1-master

If both URL path and body contain a model, the body model takes priority. The /v1/models endpoint returns all available model IDs.

Quick Start

x402 (USDC on Base)

Pay with USDC stablecoins. No BTC needed. No gas for the payer.

1. POST /v1/chat/completions returns 402. Server responds with a Payment-Required header (base64 JSON: price, payTo, EIP-712 domain).

2. Decode the header and sign an EIP-3009 TransferWithAuthorization with your wallet (off-chain signature -- no gas, no approval tx).

3. POST /v1/chat/completions with the Payment-Signature header returns 200. Server settles USDC on-chain, then returns inference.

L402 (Bitcoin Lightning)

Pay with Bitcoin over the Lightning Network. Instant settlement, 21-sat minimum.

1. POST /v1/chat/completions returns 402. Server responds with a WWW-Authenticate header (macaroon + Lightning invoice).

2. Pay the Lightning invoice, receive the preimage.

3. POST /v1/chat/completions with the Authorization: L402 header returns 200. Format: Authorization: L402 {macaroon}:{preimage}.

Endpoints

All inference endpoints are OpenAI-compatible. Base URL: https://llm402.ai

OpenAI-Compatible

Method	Path	Description	Auth
POST	`/v1/chat/completions` `/v1/chat/completions/{model}`	Chat completions (streaming + buffered). Model can be in URL path or request body.	L402 / x402 / Balance / Cashu
POST	`/v1/embeddings`	Text embeddings (max 128 strings per batch, no streaming)	L402 / x402 / Balance / Cashu
POST	`/v1/images/generations` `/v1/images/generations/{model}`	Image generation (synchronous, one image per request). Model can be in URL path or request body.	L402 / x402 / Balance / Cashu
POST	`/v1/videos` `/v1/videos/generations/{model}`	Create a video generation job (async, returns job ID). Model can be in URL path or request body.	L402 / x402 / Balance / Cashu
GET	`/v1/videos/{job_id}`	Poll video job status (no auth required)	None
GET	`/v1/videos/{job_id}/content`	Stream finished video MP4 through llm402 proxy (provider URL never exposed). Returned as `video_url` on completed poll responses.	None
POST	`/v1/balance`	Prepaid balance: create, top up, check status	None / Balance
GET	`/v1/models`	List all available models (OpenAI-compatible)	None (free)

Ollama-Compatible

Method	Path	Description	Auth
POST	`/api/generate/{model}`	Text generation	L402 / x402 / Balance / Cashu
POST	`/api/chat/{model}`	Chat	L402 / x402 / Balance / Cashu
GET	`/api/tags`	Model catalog with pricing	None (free)

Ollama Examples

bash

# Chat via Ollama-compatible endpoint (model in path)
curl -s -X POST https://llm402.ai/api/chat/deepseek-v3.2 \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"hello"}],"stream":false}'

bash

# Text generation via Ollama-compatible endpoint
curl -s -X POST https://llm402.ai/api/generate/deepseek-v3.2 \
  -H "Content-Type: application/json" \
  -d '{"prompt":"Explain Bitcoin in one sentence","stream":false}'

bash

# List all models with pricing and endpoints
curl -s https://llm402.ai/api/tags | jq '.models[] | {name, price_sats}'

Utility

Method	Path	Description	Auth
GET	`/health`	Service health and status	None (free)
POST	`/v1/estimate-cost`	Pre-authorization cost estimation	None (free)
POST	`/api/invoice/status`	Poll Lightning invoice payment status	None (free)
GET	`/.well-known/l402`	L402 service discovery (agent-readable)	None (free)
GET	`/.well-known/openapi.json`	OpenAPI 3.1.0 specification	None (free)
GET	`/.well-known/x402-discovery.json`	x402 v2 Bazaar discovery (resource catalog with route schemas + prices)	None (free)

Estimate Cost

Pre-authorize requests by checking the cost before paying. This endpoint is free and requires no authentication. Useful for MCP clients, agents, and budgeting.

bash

curl -s -X POST https://llm402.ai/v1/estimate-cost \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "Explain quantum computing"}],
    "max_tokens": 500,
    "pref": "balanced"
  }'

Response:

json

{
  "model": "deepseek-v3.2",
  "shortName": "deepseek-v3.2",
  "category": "general_knowledge",
  "confidence": 0.82,
  "rc": 100,
  "estimatedInputTokens": 8,
  "estimatedOutputTokens": 500,
  "costSats": 21,
  "costUsd": 0.000152,
  "btcPrice": 68000
}

Parameters

Field	Required	Description
`messages`	Yes	Array of message objects (same format as chat completions)
`model`	No	Model name (short or full ID). If omitted or `"auto"`, the server auto-routes.
`max_tokens`	No	Upper bound on output tokens. When omitted, the server applies a per-model default configured server-side; the global fallback is 2048 if no per-model default is set. Use `POST /v1/estimate-cost` to see the exact `estimatedOutputTokens` the server will use for your model. You are billed on this cap, not on actual consumption — set it tight for short replies, bump it up (8192, 16384, 32768+) for long-form generation. See Pricing.
`pref`	No	Routing preference: `quality`, `balanced`, `cost`, `speed`
`max_cost`	No	Maximum cost in sats (routes only to models within budget)

Response fields

Field	Description
`model`	Resolved model ID (full upstream form, e.g. `deepseek/deepseek-v3.2`)
`shortName`	Short model alias (e.g. `deepseek-v3.2`) — same form accepted in URL paths
`category`	Auto-routing category the prompt classified into (e.g. `code`, `reasoning`, `general_knowledge`)
`confidence`	Classifier confidence (0–1) for the chosen category
`rc`	Routing complexity tier (10–100): higher = more capable model required for the prompt
`estimatedInputTokens`	Estimated input tokens (used for billing; capped from prompt length)
`estimatedOutputTokens`	Estimated output tokens (taken from `max_tokens` cap)
`costSats`	Estimated invoice price in sats (will match the 402 challenge if sent now)
`costUsd`	Same estimate in USD (informational)
`btcPrice`	Current BTC price used for the conversion (refreshes per model-sync cycle)
`webSearchEnabled`	Boolean — `true` if the request specified `web_search: true` and the surcharge is included

Invoice Status

Poll the payment status of a Lightning invoice. Useful for wallet integrations that need to confirm payment before re-sending with the L402 header.

bash

curl -s -X POST https://llm402.ai/api/invoice/status \
  -H "Content-Type: application/json" \
  -d '{
    "payment_hash": "a1b2c3d4...64hex",
    "macaroon": "AgEJ..."
  }'

# Before payment: { "paid": false }
# After payment:  { "paid": true, "preimage": "e5f6a7b8...64hex" }

Security: The macaroon field is required and must match the payment_hash. This prevents preimage theft by ensuring only the original invoice requester can poll for the preimage.

x402 Protocol (USDC)

x402 uses EIP-3009 TransferWithAuthorization for gasless USDC payments on Base. The payer signs an off-chain authorization; the server settles it on-chain.

Network and Asset

Field	Value
Network	`eip155:8453` (Base mainnet, chain ID 8453)
Asset	USDC `0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913`
Denomination	Atomic USDC (6 decimals: `1000000` = $1.00)

Payment Flow

1. Get the 402 challenge

Send a normal inference request with no auth headers:

bash

curl -s -D- -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4.6",
    "messages": [{"role": "user", "content": "Say hello."}],
    "max_tokens": 50
  }'

The server responds with HTTP 402. The response body contains all payment information:

json

{
  "error": "Payment Required",
  "description": "claude-sonnet-4.6 inference, pay-per-request over Lightning, USDC, or Cashu",
  "price": 42,
  "model": "claude-sonnet-4.6",
  "provider": "llm402.ai",
  "max_tokens": 50,
  "estimated_input_tokens": 12,
  "invoice": "lnbc420n...",
  "macaroon": "AgEJ...",
  "paymentHash": "a1b2c3d4e5f6...64hex",
  "x402": {
    "price_usd": "0.000305",
    "network": "eip155:8453",
    "address": "0xe05cf38...",
    "asset": "0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913",
    "scheme": "exact"
  },
  "cashu": {
    "price_sats": 42,
    "unit": "sat",
    "description": "Send sat-denominated Cashu tokens in X-Cashu header. Server-configured mint allowlist — see /llms.txt for the current list."
  }
}

The response headers include all payment options:

http

HTTP/2 402
WWW-Authenticate: L402 macaroon="...", invoice="lnbc..."
Payment-Required: eyJzY2hlbWUiOiJleGFjdCIsIm5ldH...
Cache-Control: no-store

Payment-Required Header

Base64-encoded JSON in x402 v2 envelope format. Decode it and use accepts[0] for payment details:

json

{
  "x402Version": 2,
  "error": "Payment required",
  "accepts": [
    {
      "scheme": "exact",
      "network": "eip155:8453",
      "amount": "3150",
      "asset": "0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913",
      "payTo": "0x...",
      "maxTimeoutSeconds": 120,
      "extra": {
        "name": "USD Coin",
        "version": "2"
      }
    }
  ],
  "resource": {
    "url": "https://llm402.ai/v1/chat/completions",
    "description": "LLM inference",
    "mimeType": "application/json"
  },
  "price": "$0.003150"
}

Field	Description
`x402Version`	Always `2`
`accepts`	Array of payment options. Always use `accepts[0]`
`accepts[0].scheme`	Always `"exact"`
`accepts[0].network`	Always `"eip155:8453"` (Base mainnet)
`accepts[0].amount`	Price in atomic USDC (6 decimals). `"3150"` = $0.003150
`accepts[0].asset`	USDC contract address on Base
`accepts[0].payTo`	Server's wallet address (recipient)
`accepts[0].maxTimeoutSeconds`	Maximum settlement time (120s)
`accepts[0].extra.name`	EIP-712 domain name. Always `"USD Coin"` (not `"USDC"` -- that is testnet)
`accepts[0].extra.version`	EIP-712 domain version. Always `"2"`
`price`	Human-readable USD price (informational only, use `accepts[0].amount` for signing)
`extensions.bazaar`	Optional. x402 Bazaar discovery metadata (route schema, input/output examples). Forward unmodified in your payment payload — spec-compliant clients should pass it through.

2. Sign the EIP-3009 authorization

Build a TransferWithAuthorization signature using EIP-712 typed data.

EIP-712 Domain

javascript

const domain = {
  name: "USD Coin",        // from extra.name
  version: "2",            // from extra.version
  chainId: 8453,           // Base mainnet
  verifyingContract: "0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913"
};

EIP-712 Types

javascript

const types = {
  TransferWithAuthorization: [
    { name: "from",        type: "address" },
    { name: "to",          type: "address" },
    { name: "value",       type: "uint256" },
    { name: "validAfter",  type: "uint256" },
    { name: "validBefore", type: "uint256" },
    { name: "nonce",       type: "bytes32" },
  ]
};

Authorization Message

javascript

const now = Math.floor(Date.now() / 1000);
const nonce = "0x" + crypto.randomBytes(32).toString("hex");
const opt = paymentRequired.accepts[0];  // always use accepts[0]

const message = {
  from:        walletAddress,             // your address (payer)
  to:          opt.payTo,                 // from accepts[0]
  value:       BigInt(opt.amount),
  validAfter:  BigInt(now - 600),         // 10 min ago (clock skew buffer)
  validBefore: BigInt(now + 120),         // 2 min from now
  nonce:       nonce,
};

3. Build the payment payload

Construct the V2 payment payload and base64-encode it:

javascript

const opt = paymentRequired.accepts[0];  // always use accepts[0]
const payload = {
  x402Version: 2,
  resource: paymentRequired.resource,
  accepted: {
    scheme: opt.scheme,
    network: opt.network,
    amount: opt.amount,
    asset: opt.asset,
    payTo: opt.payTo,
    maxTimeoutSeconds: opt.maxTimeoutSeconds,
    extra: opt.extra
  },
  payload: {
    signature: signature,
    authorization: {
      from: walletAddress,
      to: opt.payTo,
      value: opt.amount,
      validAfter: (now - 600).toString(),
      validBefore: (now + 120).toString(),
      nonce: nonce
    }
  }
};

const paymentSignature = Buffer.from(JSON.stringify(payload)).toString("base64");

4. Send the paid request

Re-send the same inference request with the Payment-Signature header:

bash

curl -s -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Payment-Signature: eyJ4NDAyVmVyc2lvbiI6Mix..." \
  -d '{
    "model": "claude-sonnet-4.6",
    "messages": [{"role": "user", "content": "Say hello."}],
    "max_tokens": 50
  }'

Response (HTTP 200):

json

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1712345678,
  "model": "claude-sonnet-4.6",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Hello! How can I help you today?"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 4,
    "completion_tokens": 10,
    "total_tokens": 14
  }
}

Auto-routing gotcha (x402 / Cashu): If you used model: "auto" on the 402 challenge, the server routed to a specific model and returned it in the X-Route-Model response header. On the x402 / Cashu retry with payment, you must echo that specific model back in the body — not "auto" again — because x402 / Cashu have no server-side memory of the original routing decision. (L402 differs: the macaroon binds the routed model in its Model caveat, so the L402 retry can keep "auto" in the body.)

CORS

The server allows cross-origin x402 requests:

http

Access-Control-Allow-Headers: Content-Type, Authorization, Payment-Signature, X-Cashu, Mcp-Session-Id
Access-Control-Expose-Headers: X-Route-Model, X-Route-Category, Payment-Required, WWW-Authenticate, X-Cashu-Change

Nonce Replay Protection

Each signed authorization can only be used once. Replay protection is enforced both server-side and on-chain via EIP-3009 nonces.

L402 Protocol (Lightning)

L402 (formerly LSAT) combines HTTP 402 status codes with Lightning Network payments and macaroon-based authentication. It is the original payment protocol supported by llm402.ai.

Payment Flow

1. Client sends POST /v1/chat/completions with no auth.

2. Server returns 402 with WWW-Authenticate: L402 macaroon="...", invoice="lnbc...".

3. Client pays the Lightning invoice and obtains the preimage.

4. Client re-sends the request with Authorization: L402 {macaroon}:{preimage}.

5. Server verifies macaroon + preimage, proxies to inference, and returns 200.

Curl Example

bash

# Send a request with no auth -- get back a 402 with an invoice
curl -s -D- -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-v3.2",
    "messages": [{"role": "user", "content": "hi"}],
    "max_tokens": 50
  }'

# Response includes:
#   WWW-Authenticate: L402 macaroon="AgEJ...", invoice="lnbc210n1pn..."
#   Body: { "error": "Payment Required", "price": 21, "invoice": "lnbc...", "macaroon": "AgEJ...", ... }

bash

# Pay the Lightning invoice with your wallet and get the preimage.
# Then resend the exact same request with the L402 Authorization header:

curl -s -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: L402 AgEJbGxtNDAyLmFp...:a1b2c3d4e5f67890..." \
  -d '{
    "model": "deepseek-v3.2",
    "messages": [{"role": "user", "content": "hi"}],
    "max_tokens": 50
  }'

# Response: HTTP 200 with chat completion

WWW-Authenticate Header

The 402 response includes a WWW-Authenticate header with two components:

http

WWW-Authenticate: L402 macaroon="AgELbGxt...", invoice="lnbc50n1pn..."

Component	Description
`macaroon`	Base64-encoded V2 TLV macaroon with embedded caveats. Bound to a specific payment hash.
`invoice`	BOLT-11 Lightning invoice. Pay this to obtain the preimage.

Macaroon Caveats

Each macaroon is bound with first-party caveats that restrict its use. The server verifies all caveats on the paid request and rejects any that fail (fail-closed):

Caveat	Format	Description
`RequestPath`	`RequestPath = /v1/chat/completions`	Restricts the macaroon to a specific API endpoint
`ExpiresAt`	`ExpiresAt = 1712345678`	Unix timestamp expiry (5 minutes from issuance)
`MaxTokens`	`MaxTokens = 256`	Maximum output tokens the request may use
`MaxInputChars`	`MaxInputChars = 1500`	Prevents input inflation after invoice issuance
`MaxInputTokens`	`MaxInputTokens = 400`	Prevents token-count gaming (chars pass but tokens are higher)
`NotBefore`	`NotBefore = 1712340000`	Prevents preimage replay after server restart
`MaxInputItems`	`MaxInputItems = 5`	Binds the macaroon to the actual batch size from the request (example shows 5 items; max accepted is 128 per `/v1/embeddings` batch). No tolerance — retry must use the same item count.
`Model`	`Model = claude-sonnet-4.6`	Binds the macaroon to a specific model (prevents cross-model bypass)
`MediaType`	`MediaType = image`	Emitted for `/v1/images/generations` and `/v1/videos`. Restricts a media macaroon to a specific media class (`image` or `video`).
`MaxUnits`	`MaxUnits = 1`	Number of output units the macaroon covers (e.g. images or videos). Always `1` on media endpoints.
`MaxDuration`	`MaxDuration = 8`	Maximum video duration in seconds (typical range 1–10). Emitted ONLY when the request specifies `seconds` — binds the macaroon to that duration to prevent post-invoice upsell. Default-discovery 402 challenges (no `seconds`) omit this caveat; the per-model duration cap from `/v1/models` `capabilities.durations` applies instead. Video only.
`MaxDimension`	`MaxDimension = 1920`	Maximum video longest-side pixels. Emitted ONLY when the request specifies `width`/`height` — binds the macaroon to that resolution to prevent post-invoice upsell. Default-discovery 402 challenges omit this caveat. Video only.
`WebSearch`	`WebSearch = true`	Added to chat macaroons when the original request sent `web_search: true`. Binds the paid search surcharge to the flag.

Fail-closed design: Unrecognized caveats are rejected. This ensures future caveat additions don't accidentally pass on old server versions.

Authorization Header

After paying the Lightning invoice and receiving the preimage, send the authorization:

http

Authorization: L402 AgELbGxt...:abc123def456...

Format: L402 {base64_macaroon}:{hex_preimage}

Tip: on retry you can send "model": "auto" in the body — the server extracts the routed model from the macaroon's Model caveat, so you can reuse the exact same body from 402 discovery without echoing the routed model back.

The server verifies:

The macaroon signature against the root key
All caveats pass (path, expiry, tokens, model, etc.)
The preimage hashes to the payment hash embedded in the macaroon identifier
The preimage has not been used before (atomic Redis SET NX, burned before inference begins)

Preimages are single-use and non-refundable on L402. The server claims the preimage atomically before calling the upstream model. If inference then fails (502, timeout, etc.), the preimage is already spent and cannot be retried. This is intentional — burning the preimage after inference would open a replay window where a concurrent request could reuse it. If you need automatic refund-on-failure semantics, use balance tokens or pay with Cashu instead.

Balance Tokens (Prepaid)

Balance tokens let you prepay for multiple requests with a single Lightning payment or USDC transfer. Fund a balance once, then use Authorization: Bearer bal_... on any gated endpoint without per-request payment flows.

How It Works

1. POST /v1/balance with { "sats": 1000 } returns 402 and a Lightning invoice.

2. Pay the Lightning invoice with your wallet.

3. POST /v1/balance with { "payment_hash": "hex64" } returns 200 and { "paid": true, "token": "bal_...", "sats": 1000 }.

4. Use Authorization: Bearer bal_... on any endpoint.

Endpoints

1. Create balance (Lightning)

Request a Lightning invoice to fund a new balance:

bash

# Step 1: Request an invoice for 1000 sats
curl -s -X POST https://llm402.ai/v1/balance \
  -H "Content-Type: application/json" \
  -d '{"sats": 1000}'

# Response (402):
# { "payment_hash": "a1b2c3...", "invoice": "lnbc10u...", "sats": 1000, "expires_in": 600 }

2. Poll for payment

After paying the invoice, poll with the payment hash to get your token. (An unknown payment_hash returns 404 Unknown payment_hash; pending invoices return 200 {"paid": false}.)

bash

# Step 2: Poll until paid
curl -s -X POST https://llm402.ai/v1/balance \
  -H "Content-Type: application/json" \
  -d '{"payment_hash": "a1b2c3d4e5f6...64hex"}'

# Before payment: { "paid": false }
# After payment:  { "paid": true, "token": "bal_xxxx...", "sats": 1000, "expires_at": "2026-05-03T..." }

3. Check balance

bash

curl -s -X POST https://llm402.ai/v1/balance \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer bal_xxxx..." \
  -d '{"action": "status"}'

# Response:
# { "sats": 850, "expires_at": "2026-05-03T...", "total_spent": 150, "requests": 7 }

4. Top up (Lightning)

Add sats to an existing balance:

bash

# Get a top-up invoice
curl -s -X POST https://llm402.ai/v1/balance \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer bal_xxxx..." \
  -d '{"sats": 500}'

# Returns 402 with a new invoice. Pay it, then poll with payment_hash as above.

Fund with USDC (x402)

Fund a balance using USDC on Base instead of Lightning. Send a single POST with a Payment-Signature header carrying an EIP-3009 TransferWithAuthorization envelope — no 402 challenge flow is required from this endpoint. The server settles the USDC on-chain via the CDP facilitator, derives the sats to credit from the signed amount at its current BTC price, and returns the balance token in one round trip.

1. Pick a USDC amount to fund

Decide how many sats you want to buy and convert to USDC atomic units (6 decimals) at a BTC price you are willing to pay. The server will re-derive sats from the signed USDC amount at its own BTC price — your body.sats hint is advisory and never authoritative.

2. Sign an EIP-3009 TransferWithAuthorization

Sign an off-chain authorization to transfer USDC on Base from your wallet to the server’s receiving address. The signing domain, asset, network, and payTo are identical to the values served in Payment-Required envelopes on other llm402 endpoints.

javascript

// Illustrative. Uses viem. Install: npm install viem
import { createWalletClient, http, parseSignature } from 'viem';
import { base } from 'viem/chains';
import { privateKeyToAccount } from 'viem/accounts';

const account = privateKeyToAccount(process.env.PRIV_KEY);
const client = createWalletClient({ account, chain: base, transport: http() });

// USDC on Base, 6 decimals. Example: $0.50 => 500000 atomic units.
const amountAtomic = '500000';
const payTo = '0x...';                     // llm402 receiving address (from any x402 envelope)
const usdc = '0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913';

// Random 32-byte nonce (hex). Must not be reused.
const nonce = '0x' + crypto.randomBytes(32).toString('hex');
const validAfter = 0n;
const validBefore = BigInt(Math.floor(Date.now() / 1000) + 120);

const signature = await client.signTypedData({
  domain: { name: 'USD Coin', version: '2', chainId: 8453, verifyingContract: usdc },
  types: {
    TransferWithAuthorization: [
      { name: 'from', type: 'address' },
      { name: 'to', type: 'address' },
      { name: 'value', type: 'uint256' },
      { name: 'validAfter', type: 'uint256' },
      { name: 'validBefore', type: 'uint256' },
      { name: 'nonce', type: 'bytes32' },
    ],
  },
  primaryType: 'TransferWithAuthorization',
  message: {
    from: account.address, to: payTo, value: BigInt(amountAtomic),
    validAfter, validBefore, nonce,
  },
});

// x402 v2 envelope — base64(JSON) for the Payment-Signature header
const envelope = {
  x402Version: 2,
  scheme: 'exact',
  network: 'eip155:8453',
  payload: {
    signature,
    authorization: {
      from: account.address, to: payTo, value: amountAtomic,
      validAfter: validAfter.toString(), validBefore: validBefore.toString(),
      nonce,
    },
  },
};
const paymentSignature = Buffer.from(JSON.stringify(envelope)).toString('base64');

3. POST to /v1/balance with the signed envelope

The body may be omitted entirely, or may contain {"sats": N} as a hint for your own UX. The server ignores body.sats for accounting and derives sats from the signed USDC amount.

bash

curl -s -X POST https://llm402.ai/v1/balance \
  -H "Content-Type: application/json" \
  -H "Payment-Signature: $PAYMENT_SIG" \
  -d '{"sats": 500}'

# 200 OK on success:
# { "paid": true, "token": "bal_xxxx...", "sats": 500, "credited": 500 }

To top up an existing balance, add Authorization: Bearer bal_... to the same request. The server caps the top-up at 50000 sats total and returns 400 if the signed amount would exceed the cap.

Server derives sats, not client: the credited sats are computed from the signed USDC atomic amount at the server’s current BTC price, not from body.sats. If BTC moves between the moment you decide an amount and the moment the server settles, your credited sats may not match your body.sats hint. Always trust the sats field in the 200 response, not the request.

✔ Use the balance token

Once you have a bal_ token (from either the Lightning flow above or the USDC funding flow), include it as a Bearer auth on any gated endpoint:

bash

curl -s -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer bal_xxxx..." \
  -d '{
    "model": "deepseek-v3.2",
    "messages": [{"role": "user", "content": "Explain Lightning Network"}],
    "max_tokens": 200
  }'

Token Lifecycle

Rule	Value
Inactivity TTL	30 days (resets on each use)
Max lifetime	90 days from creation
Max balance	50,000 sats
Min deposit	100 sats

Top-ups reset the inactivity timer but do not extend the 90-day max lifetime. Plan deposits accordingly.

Cashu Tokens (Ecash)

Pay with Cashu ecash tokens -- instant, private Bitcoin micropayments with no Lightning channel required. Send tokens directly in the request header. If you overpay, the server returns change tokens.

How It Works

1. POST /v1/chat/completions (no auth) returns 402. The response body includes cashu.price_sats.

2. POST /v1/chat/completions with an X-Cashu header returns 200. The server swaps tokens at the mint, runs inference, and returns change if overpaid.

Request

Send a cashuB (v4) token in the X-Cashu header:

bash

curl -s -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-Cashu: cashuBo2F0gaJha..." \
  -d '{
    "model": "deepseek-v3.2",
    "messages": [{"role": "user", "content": "hello"}],
    "max_tokens": 100
  }'

Response Headers

Every successful Cashu-paid call emits the same pair of response headers, and the server always lists them in Access-Control-Expose-Headers so browser clients can read them:

Header	Value	Meaning
`X-Cashu-Consumed`	`true` \| `refunded`	Status flag — `true` means the server swapped the proofs at the mint and consumed payment; `refunded` means the request failed after swap and the full amount was returned in `X-Cashu-Change`.
`X-Cashu-Change`	`cashuB...` token	Emitted when the presented token exceeded the price by at least 2 sats (`MIN_CHANGE_SATS`). Smaller overpayments are absorbed. The change token is capped at 8 KB; if the split would produce a larger header, change is absorbed.

These headers appear on every endpoint that accepts X-Cashu payment — /v1/chat/completions, /v1/embeddings, /v1/images/generations, and /v1/videos.

http

HTTP/2 200
Content-Type: application/json
X-Cashu-Consumed: true
X-Cashu-Change: cashuBo2F0gaJha...
Access-Control-Expose-Headers: X-Cashu-Consumed, X-Cashu-Change

The change token is a standard cashuB token. Import it with any Cashu wallet or present it on a subsequent request.

Import change or lose it: X-Cashu-Change carries real bearer money. Wallet clients MUST read the header and import the proofs on every successful response. Discarding the header is equivalent to burning the overpayment — the server does not retain a copy.

Constraints

Rule	Value
Token format	`cashuB` (v4) only. Deprecated `cashuA` (v3) tokens are rejected.
Unit	Sat-denominated only (no USD or other units)
Max proofs	20 per token (DoS prevention)
Streaming	Not supported. Cashu requires buffered responses to calculate change. Use `"stream": false`.
Change threshold	2 sats minimum. Overpayment of 1 sat is absorbed (not worth the mint round-trip).
Change size limit	8 KB. If the change token exceeds 8 KB, it is absorbed by the server.
Mint	Server-configured allowlist. HTTPS-only, no private IPs. The 402 response body's `cashu.description` field indicates the current policy.

No 402 dance needed: Unlike L402, you can skip the initial 402 request if you already know the price — just send the Cashu token directly and the server verifies the token value covers the model’s price. (x402 has a similar shortcut on POST /v1/balance for funding a balance token, but inference endpoints still expect either a prior 402 or a known price.)

MCP Server

llm402.ai provides a hosted Model Context Protocol (MCP) server. Connect from any MCP client — Claude Code, Claude Desktop, Cursor, or any tool that supports MCP. Six tools are available: text inference, image generation, video generation, model discovery, balance management, and funding.

Setup

1. Get a balance token. Visit llm402.ai/chat and fund a balance with Lightning or USDC. Copy your bal_ token from the balance display.

2. Add to your MCP client config.

Claude Code (~/.claude.json):

json

{
  "mcpServers": {
    "llm402": {
      "url": "https://llm402.ai/mcp",
      "headers": {
        "Authorization": "Bearer bal_YOUR_TOKEN_HERE"
      }
    }
  }
}

Claude Desktop (claude_desktop_config.json) — Claude Desktop only supports stdio MCP transport, so use the third-party mcp-remote bridge:

json

{
  "mcpServers": {
    "llm402": {
      "command": "npx",
      "args": [
        "mcp-remote",
        "https://llm402.ai/mcp",
        "--header",
        "Authorization:Bearer bal_YOUR_TOKEN_HERE"
      ]
    }
  }
}

Replace bal_YOUR_TOKEN_HERE with your actual balance token. That’s it — the MCP client discovers all tools automatically. (mcp-remote is a community npm package; pin the version with [email protected] in production.)

Available Tools

Tool	Auth	Description
`llm402_inference`	Required	Text inference. 400+ models, auto-routed by default. Supports system prompts, model selection, temperature, max_tokens, and routing preference (quality/balanced/cost/speed).
`llm402_image`	Required	Image generation. Requires a specific model ID (e.g. `black-forest-labs/FLUX.1-schnell`). Supports width, height, steps, seed, negative prompt.
`llm402_video`	Required	Video generation (async). Requires a specific model ID (e.g. `wan2.7-t2v`). Supports seconds, width, height, fps. Polls for completion up to 90s, then returns job URL for manual polling.
`llm402_models`	None	List available models. Optional substring filter (e.g. `"deepseek"`, `"flux"`). Free, no balance required.
`llm402_balance`	Required	Check your prepaid balance: remaining sats, total deposited, total spent, request count.
`llm402_fund`	Required	Generate a Lightning invoice to top up your balance. Default 5,000 sats. Polls for payment confirmation up to 45 seconds.

Example: Text Inference

bash

# Using curl against the MCP endpoint directly (JSON-RPC format)
curl -X POST https://llm402.ai/mcp \
  -H "Content-Type: application/json" \
  -H "Accept: application/json, text/event-stream" \
  -H "Authorization: Bearer bal_YOUR_TOKEN" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
      "name": "llm402_inference",
      "arguments": {
        "prompt": "Explain quantum computing in one sentence.",
        "max_tokens": 100
      }
    },
    "id": 1
  }'

Example: Image Generation

bash

curl -X POST https://llm402.ai/mcp \
  -H "Content-Type: application/json" \
  -H "Accept: application/json, text/event-stream" \
  -H "Authorization: Bearer bal_YOUR_TOKEN" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
      "name": "llm402_image",
      "arguments": {
        "prompt": "A cyberpunk cityscape at night with neon lights",
        "model": "black-forest-labs/FLUX.1-schnell"
      }
    },
    "id": 1
  }'

OpenAI-Compatible Alternative

If your tool doesn't support MCP but accepts OPENAI_BASE_URL, the same balance token works directly:

bash

export OPENAI_BASE_URL=https://llm402.ai/v1
export OPENAI_API_KEY=bal_YOUR_TOKEN

This works with Cursor, Aider, LangChain, the OpenAI Python SDK, and any tool that accepts a custom base URL.

Endpoint

MCP endpoint: https://llm402.ai/mcp

Protocol: Streamable HTTP (POST only). Each request is independently authenticated via the Authorization: Bearer bal_ header — no server-side session state is persisted between calls. (The Mcp-Session-Id CORS header is allowed for spec-compliant clients that send it, but the server doesn’t require or track it.) Responses are Server-Sent Events containing JSON-RPC results.

OpenClaw Plugin

Use llm402.ai from inside OpenClaw via the official @llm402/openclaw-provider plugin. All four payment rails supported: prepaid balance, Cashu ecash, USDC on Base, and Lightning. 400+ models.

1. Install

bash

npm install @llm402/openclaw-provider

Requires Node.js 22+. Includes a wallet CLI and the OpenClaw plugin.

2. Create a wallet (for Cashu / Lightning modes)

The package ships a CLI for wallet management. Skip this step if you only need balance mode (Bearer token).

bash

# Create a wallet (generates Nostr nsec + EVM keypair)
npx llm402-openclaw init

# Fund it with Lightning (prints a BOLT11 invoice — pay from any wallet)
npx llm402-openclaw fund 5000

# Check balance
npx llm402-openclaw balance

Your sats are stored locally as Cashu ecash proofs at ~/.llm402/wallet.json. From your perspective: you pay a Lightning invoice, your inference calls deduct from the Cashu balance. Run npx llm402-openclaw --help for all commands.

3. Configure in OpenClaw settings

Pick a payment mode:

Balance mode (simplest — Bearer token, zero latency, no wallet):

json

{
  "paymentMode": "balance",
  "balanceToken": "bal_YOUR_TOKEN_HERE"
}

Cashu mode (pay with ecash — requires wallet from step 2):

bash

# Reveal your nsec for the config below
LLM402_SHOW_SECRETS=1 npx llm402-openclaw init

json

{
  "paymentMode": "cashu",
  "cashuNsec": "nsec1..."
}

x402 mode (pay with USDC on Base — gasless, no ETH needed):

json

{
  "paymentMode": "x402",
  "evmPrivateKey": "0x..."
}

Lightning mode (pay L402 invoices by melting Cashu proofs):

json

{
  "paymentMode": "lightning",
  "cashuNsec": "nsec1..."
}

Wallet modes (cashu / x402 / lightning) start a local HTTP proxy on 127.0.0.1 that transparently handles the 402-and-pay cycle. OpenClaw only ever sees the final 200 response. See the plugin README for all modes.

CLI commands

Command	Description
`npx llm402-openclaw init`	Create or load a wallet
`npx llm402-openclaw fund <sats>`	Get a Lightning invoice, pay, mint Cashu proofs
`npx llm402-openclaw balance`	Show Cashu balance (+ optional USDC with `--check-usdc`)
`npx llm402-openclaw check-funding`	Resolve pending quotes from prior fund timeouts
`npx llm402-openclaw sync`	Pull wallet state from Nostr relays (opt-in)

Budget controls

Runaway cost protection. Both sats and USDC are tracked independently; either rail can reject a request before signing.

Field	Default	Max	Rail
`maxRequestBudgetSats`	500	50,000	sats
`sessionBudgetSats`	10,000	1,000,000	sats
`sessionBudgetUsdcCents`	5,000	500,000	USDC

Security

This plugin runs locally and handles wallet keys. Do not install on shared systems, CI runners, or Codespaces. Wallet lives at ~/.llm402/wallet.json with 0600 permissions. Full threat model in the SECURITY.md.

Streaming

Add "stream": true to your request body to receive Server-Sent Events (SSE) as tokens are generated. The format follows the OpenAI streaming specification.

Request

bash

curl -s -N -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer bal_xxxx..." \
  -d '{
    "model": "deepseek-v3.2",
    "messages": [{"role": "user", "content": "Count to 5"}],
    "max_tokens": 100,
    "stream": true
  }'

Response Format

The server sends a series of data: lines. Each line is a JSON chunk with a delta object containing the next token(s):

text/event-stream

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1712345678,"model":"deepseek-v3.2","choices":[{"index":0,"delta":{"role":"assistant","content":"1"},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1712345678,"model":"deepseek-v3.2","choices":[{"index":0,"delta":{"content":", "},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1712345678,"model":"deepseek-v3.2","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":24,"total_tokens":36}}

data: [DONE]

Field	Description
`choices[0].delta.role`	Sent on the first chunk (`"assistant"`). Upstream providers may bundle the first token alongside the role in the same chunk — code defensively for both shapes.
`choices[0].delta.content`	The next token(s) of the response
`choices[0].finish_reason`	`null` while generating, `"stop"` on the final chunk
`usage`	Optional. Some providers attach a token-count summary to the final chunk (the one with `finish_reason`). Treat as best-effort.
`data: [DONE]`	End-of-stream marker. Close the connection after this line.

Heartbeat: During long-running inferences, the server sends : heartbeat SSE comments every 15 seconds to keep the connection alive. These are not data lines and should be ignored by your parser.

Consuming the Stream

bash

# Stream tokens to stdout (use -N to disable buffering)
curl -s -N -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer bal_xxxx..." \
  -d '{"model":"deepseek-v3.2","messages":[{"role":"user","content":"hello"}],"max_tokens":100,"stream":true}' \
  | while IFS= read -r line; do
      echo "$line"
    done

javascript

const res = await fetch('https://llm402.ai/v1/chat/completions', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json', 'Authorization': 'Bearer bal_xxxx...' },
  body: JSON.stringify({
    model: 'deepseek-v3.2', messages: [{ role: 'user', content: 'hello' }],
    max_tokens: 100, stream: true
  })
});
const reader = res.body.getReader();
const decoder = new TextDecoder();
let buf = '';
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  buf += decoder.decode(value, { stream: true });
  const lines = buf.split('\n');
  buf = lines.pop() || ''; // keep partial trailing line for the next read
  for (const line of lines) {
    if (line.startsWith('data: ') && line !== 'data: [DONE]') {
      const chunk = JSON.parse(line.slice(6));
      const token = chunk.choices[0]?.delta?.content || '';
      process.stdout.write(token);
    }
  }
}

python

import requests, json

res = requests.post('https://llm402.ai/v1/chat/completions',
    headers={'Content-Type': 'application/json', 'Authorization': 'Bearer bal_xxxx...'},
    json={'model': 'deepseek-v3.2', 'messages': [{'role': 'user', 'content': 'hello'}],
          'max_tokens': 100, 'stream': True},
    stream=True)

for line in res.iter_lines():
    line = line.decode('utf-8')
    if line.startswith('data: ') and line != 'data: [DONE]':
        chunk = json.loads(line[6:])
        token = chunk['choices'][0].get('delta', {}).get('content', '')
        print(token, end='', flush=True)

Payment first: Streaming requires the same payment flow as buffered requests. Pay via L402, x402, or Balance token before sending a stream request. You cannot begin streaming before payment is verified. Cashu does not support streaming -- use buffered mode ("stream": false) with Cashu tokens.

Request Deduplication

Non-streaming responses are cached for 30 seconds. If you retry an identical request (same model, messages, max_tokens, and IP), the server returns the cached response immediately without re-running inference or re-charging you.

Parameter	Value
TTL	30 seconds
Max entries	100
Max entry size	1 MB
Scope	Per-IP (different IPs get separate caches)

The response includes an X-Dedup header indicating whether the response was served from cache:

http

X-Dedup: hit    # served from cache (no charge)
X-Dedup: miss   # fresh inference

To bypass the cache, send the X-No-Cache: true request header. (Streaming responses and cache-bypassed requests omit the X-Dedup header entirely — the absence of the header should be treated the same as miss.)

Web Search

Ground model responses with real-time web data. When enabled, the model searches the web before responding and includes citations to sources. Available on most models via auto-routing.

Usage

Add web_search: true to any /v1/chat/completions request:

bash

curl -X POST https://llm402.ai/v1/chat/completions \
  -H "Authorization: Bearer bal_YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "What is the current price of Bitcoin?"}],
    "web_search": true,
    "stream": true
  }'

Parameters

Parameter	Type	Description
`web_search`	`boolean`	Set to `true` to enable web search. Default: `false`.

Response

The model embeds citation markers (e.g. [1]) in its response text with links to sources. When the upstream provider returns structured citations, llm402 forwards them unmodified as an annotations array attached to the assistant message.

Non-streaming schema

The annotations array is attached to choices[0].message:

json

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "Some answer with citations [1].",
      "annotations": [
        {
          "type": "url_citation",
          "url_citation": {
            "url": "https://example.com/article",
            "title": "Example article",
            "content": "optional snippet",
            "start_index": 14,
            "end_index": 17
          }
        }
      ]
    }
  }]
}

Streaming

During an SSE stream with web_search: true, annotations do not arrive progressively. They are attached to the final stop chunk (the same chunk that carries finish_reason: "stop"), as a top-level annotations array on that chunk.

text/event-stream

data: {"id":"...","choices":[{"index":0,"delta":{"content":"answer"}}]}

data: {"id":"...","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"annotations":[{"type":"url_citation","url_citation":{"url":"https://example.com","title":"..."}}]}

data: [DONE]

Upstream pass-through: This schema follows the OpenAI / OpenRouter annotations.url_citation convention. llm402 forwards whatever the upstream provider returns without reshaping it. If the model or upstream changes the format, this shape may shift. Treat the schema as a best-effort surface and code defensively.

Pricing

Web search adds a small surcharge per request on top of the base inference cost. The surcharge varies with model and current rates — query POST /v1/estimate-cost with web_search: true for the exact number, or read the price field on the 402 challenge. Default coverage is up to 5 web page lookups per request.

Compatibility

model: "auto" — always supported. The router selects a search-capable model.
Explicit model — supported on most models. Returns 400 if the model does not support web search.
The tools parameter is not supported. Use web_search: true instead.

Estimate Cost

Use /v1/estimate-cost with web_search: true to see the price before sending a paid request:

bash

curl -X POST https://llm402.ai/v1/estimate-cost \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Latest AI news"}],
    "web_search": true
  }'

The response includes webSearchEnabled: true and the total cost with the search surcharge included.

Image Generation

Generate images from text prompts. 40+ models across multiple providers, all behind a unified OpenAI-compatible endpoint. All four payment rails are supported (L402, x402, Balance, Cashu).

Endpoint

POST /v1/images/generations or /v1/images/generations/{model}

Request Body

Field	Type	Required	Description
`model`	string	Yes*	Image model ID (e.g. `FLUX.1-schnell`). *Not required if model is in URL path.
`prompt`	string	Yes	Text description of the image (2-4096 chars)
`size`	string	No	Dimensions as `"WxH"` string (e.g. `"1024x1792"`). Use `"auto"` for model default. Overrides width/height.
`width`	integer	No	Image width in pixels (64–2048). Must provide both width and height together.
`height`	integer	No	Image height in pixels (64–2048). Must provide both width and height together.
`steps`	integer	No	Diffusion steps (1-50, default model-dependent)
`response_format`	string	No	`url` (default) or `b64_json`
`seed`	integer	No	Deterministic seed for reproducibility

Response

json

{
  "created": 1234567890,
  "model": "black-forest-labs/FLUX.1-schnell",
  "data": [
    { "url": "/v1/media/img_abc123..." }
  ]
}

The url field can take three forms:

Relative proxy path (e.g. /v1/media/img_abc123...) — the most common form. The image is served from llm402.ai with a 24h TTL and provider-agnostic CSP. Prepend https://llm402.ai to fetch.
data: URI (e.g. data:image/png;base64,...) — inline base64 for providers that return raw bytes (the image is embedded; no extra fetch).
Direct HTTPS URL — only as a fallback when the media-proxy token can't be created. HTTPS URLs from upstream providers may expire (~7 days).

Each data[i] entry may also include provider-specific extras: revised_prompt (image-prompt rewriting), or timings/index (provider diagnostics). These are pass-through — treat them as best-effort and code defensively.

Key Differences from Chat Completions

No streaming — response is synchronous
model is required (no auto-routing)
One image per request (n is always 1)
Pricing is per-image, not per-token
All image payments are non-refundable on backend failure — intentional per pentester finding (eliminates refund oracle). This diverges from chat where Balance refunds on upstream failure. Use /v1/estimate-cost + the /v1/models health signal before paying.
Request deduplication is disabled
Generation time varies: 1–10s for FLUX/diffusion models, 30–90s for GPT-5 Image models
Dimensions are automatically rounded to the nearest multiple of 16 for compatibility
Both size string and width/height integer formats are accepted

Example

bash

curl -X POST https://llm402.ai/v1/images/generations \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer bal_your_token_here" \
  -d '{
    "model": "black-forest-labs/FLUX.1-schnell",
    "prompt": "A serene mountain landscape at sunset"
  }'

Available Models

40+ image models from multiple providers. Use /v1/models or the Models page for the full live list with current sat prices. Selected highlights:

Model	Price	Notes
`flux.1-schnell`	varies	Fast, cheapest FLUX
`flux.2-pro`	varies	Professional quality
`flux.2-max`	varies	Maximum quality
`flash-image-2.5`	varies	Nano Banana — Gemini image gen
`flash-image-3.1`	varies	Nano Banana 2 — latest Gemini
`imagen-4.0-fast`	varies	Google Imagen
`gpt-5-image-mini`	varies	GPT-5 image gen (compact)
`gpt-5-image`	varies	GPT-5 image gen (full, slow ~60s)
`fibo`	varies	JSON-native, enterprise-safe
`ideogram/ideogram-3.0`	126 sats	Strong text rendering

Video Generation

Generate videos from text prompts. Unlike image generation, video generation is asynchronous: you create a job, then poll for completion. All four payment rails are supported (L402, x402, Balance, Cashu). Payment is collected when the job is created.

Workflow

Create job — POST /v1/videos with your prompt and model. Returns 202 Accepted with a job ID and poll URL.
Poll for status — GET /v1/videos/{job_id} (no auth required). Returns queued, processing, completed, or failed.
Download — When status is completed, the response includes a video_url.

Create Job

POST /v1/videos or /v1/videos/generations/{model}

Request Body

Field	Type	Required	Description
`model`	string	Yes*	Video model ID (e.g. `kling-2.1-master`). *Not required if model is in URL path.
`prompt`	string	Yes	Text description of the video (2-4096 chars)
`seconds`	integer	No	Video duration in seconds. Valid values depend on the model (check `/v1/models` for each model’s `capabilities.durations`). If omitted, the model’s default is used.
`width`	integer	No	Video width in pixels. Must be paired with `height`. Valid values depend on the model (check `/v1/models` for each model’s `capabilities.sizes`). If omitted, the model’s default is used.
`height`	integer	No	Video height in pixels. Must be paired with `width`. Valid values depend on the model (check `/v1/models` for each model’s `capabilities.sizes`). If omitted, the model’s default is used.
`fps`	integer	No	Frames per second (1–60). Only some models support this. Check `capabilities` in the `/v1/models` response.
`steps`	integer	No	Diffusion steps (model-dependent)
`guidance_scale`	number	No	Classifier-free guidance scale
`seed`	integer	No	Deterministic seed for reproducibility
`negative_prompt`	string	No	What to avoid in the generated video

Pricing

Video pricing varies by provider and model:

Together.ai models (Kling, MiniMax, Seedance, etc.) — flat per-video pricing. The price is the same regardless of duration or resolution.
OpenRouter models (Veo 3.1) — per-second pricing that scales with duration and resolution. Longer videos and higher resolutions cost more.

The 402 challenge always shows the exact price for the specific parameters you requested. If no optional parameters are specified (duration, resolution), the minimum price for that model is shown.

Model Capabilities

Each video model supports specific durations, sizes, and fps values. Sending unsupported parameters returns 400 Bad Request with the list of supported values for that model. Use GET /v1/models to discover per-model capabilities.

Video models in the /v1/models response include these additional fields:

Field	Type	Description
`model_type`	string	`"video"` — identifies this as a video generation model
`capabilities.durations`	array \| null	Supported duration values in seconds (e.g. `[5, 10]`), or `null` if unconstrained
`capabilities.sizes`	array \| null	Supported WxH dimension strings (e.g. `["1920x1080", "1280x720"]`), or `null` if unconstrained

bash

# Discover video model capabilities
curl -s https://llm402.ai/v1/models | jq '.data[] | select(.model_type=="video") | {id, capabilities}'

Response (202 Accepted)

json

{
  "id": "vj_abc123...",
  "status": "queued",
  "model": "minimax/video-01-director",
  "poll_url": "/v1/videos/vj_abc123...",
  "poll_interval_ms": 5000,
  "created_at": 1234567890
}

Poll Job Status

GET /v1/videos/{job_id}

Response (completed)

json

{
  "id": "vj_abc123...",
  "status": "completed",
  "model": "minimax/video-01-director",
  "video_url": "/v1/videos/vj_abc123.../content",
  "done_at": 1234567890,
  "created_at": 1234567000,
  "poll_interval_ms": 5000
}

Response (failed)

json

{
  "id": "vj_abc123...",
  "status": "failed",
  "model": "minimax/video-01-director",
  "error": "upstream provider timeout",
  "poll_interval_ms": 5000
}

Key Differences from Image Generation

Asynchronous — returns immediately with a job ID, not a finished result
Polling required — use poll_url to check status; respect poll_interval_ms
URL-only — no b64_json response format; videos are always returned as URLs
Longer generation times — expect 30s–5min depending on model and duration
Non-refundable — payment is collected at job creation, not on completion
model is required (no auto-routing)
One video per request
Request deduplication is disabled

Example

bash

# Create video job
curl -X POST https://llm402.ai/v1/videos \
  -H "Authorization: Bearer bal_YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"model": "minimax/video-01-director", "prompt": "A cat walking through a garden", "seconds": 5}'

# Response (202 Accepted):
# {"id":"vj_abc...","status":"queued","model":"minimax/video-01-director","poll_url":"/v1/videos/vj_abc...","poll_interval_ms":5000}

# Poll for completion
curl https://llm402.ai/v1/videos/vj_abc123...

# Response (completed):
# {"id":"vj_abc...","status":"completed","model":"minimax/video-01-director","video_url":"/v1/videos/vj_abc.../content","done_at":1234567890,"created_at":1234567000,"poll_interval_ms":5000}

# Generate a 16:9 HD video with specific duration
curl -X POST https://llm402.ai/v1/videos \
  -H "Authorization: Bearer bal_YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"model": "google/veo-3.0", "prompt": "cat walking on beach", "seconds": 8, "width": 1920, "height": 1080}'

Available Models

Video models from multiple providers. Use /v1/models or the Models page for the full live list with pricing and per-model capabilities. Available models include Sora 2, Veo 3.0, Veo 3.1 (via OpenRouter, per-second pricing), Kling 2.1, Seedance, PixVerse, MiniMax, Vidu, and Wan.

Fetching Generated Video

GET /v1/videos/{job_id}/content

Completed video jobs expose their binary through an opaque proxy endpoint. When the poll response of GET /v1/videos/{job_id} returns status: "completed", the accompanying video_url field is a relative path of the form /v1/videos/vj_…/content pointing at this endpoint. The provider’s actual CDN URL is never exposed to the client.

No additional authentication is required — the job_id itself is a 128-bit capability token (vj_ + 32 hex characters).

End-to-end flow

1. Create the job with a balance token, Lightning, or USDC. Receive a 202 with poll_url.

2. Poll GET /v1/videos/{job_id} at poll_interval_ms until status === "completed".

3. Issue a GET against the video_url path to stream the video bytes.

bash

curl -s -L -o out.mp4 "https://llm402.ai/v1/videos/vj_aaaaaaaabbbbbbbbccccccccdddddddd/content"

The placeholder job ID above will return 404 Video not available — that is expected, and documents the shape of the "unknown or expired job" error.

Response

On success the server streams the binary with the following headers:

Header	Value
`Content-Type`	One of `video/mp4`, `video/webm`, `video/quicktime`, `video/x-msvideo`. Anything outside this allowlist is coerced to `video/mp4`.
`Content-Length`	Forwarded from the upstream provider when present. Responses larger than the server’s body cap are rejected with `502`.
`Cache-Control`	`private, max-age=3600`
`Content-Security-Policy`	`default-src 'none'; sandbox` (prevents script execution in proxied content)

Status codes

Status	Meaning
`200`	Video bytes streaming.
`404`	Unknown job, not yet completed, or the server no longer has a `videoUrl` for it (expired).
`403`	Upstream video URL host is not on the proxy allowlist, or the supplied job ID is malformed (must match `vj_` + 32 hex chars).
`502`	Upstream download failed or body exceeds size cap.
`503`	Concurrent video-proxy capacity reached; retry with exponential backoff. A `Retry-After: 10` header is returned.
`504`	Upstream download timed out (60s).

Rate limit: this endpoint shares the video-polling class at 60 requests / minute per IP.

Provider URLs are never exposed: all completed video content is served via /v1/videos/{job_id}/content. Clients never see the upstream CDN or provider URL, and cannot reach the provider directly.

Handle 503 with backoff: in-flight content requests may be rejected with 503 Video proxy busy when the server reaches its concurrent-proxy cap. Clients MUST implement exponential backoff and retry; do not tight-loop on 503, or you will trip the 60/minute rate limit and receive 429.

Code Examples

Complete examples for each payment method and language.

Uses viem for EIP-712 signing. Install: npm install viem

javascript

const API_URL = 'https://llm402.ai/v1/chat/completions';
const body = JSON.stringify({
  model: 'claude-sonnet-4.6',
  messages: [{ role: 'user', content: 'Say hello.' }],
  max_tokens: 50
});

// 1. Get 402 -- parse Payment-Required header (x402 v2 envelope)
const res402 = await fetch(API_URL, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body });
const envelope = JSON.parse(Buffer.from(res402.headers.get('Payment-Required'), 'base64').toString());
const req = envelope.accepts[0];  // always use accepts[0]
const routedModel = res402.headers.get('X-Route-Model') || 'claude-sonnet-4.6';

// 2. Sign EIP-3009 TransferWithAuthorization (off-chain, no gas)
const signature = await walletClient.signTypedData({
  domain: { name: req.extra.name, version: req.extra.version, chainId: 8453, verifyingContract: req.asset },
  types: { TransferWithAuthorization: [
    { name: 'from', type: 'address' }, { name: 'to', type: 'address' },
    { name: 'value', type: 'uint256' }, { name: 'validAfter', type: 'uint256' },
    { name: 'validBefore', type: 'uint256' }, { name: 'nonce', type: 'bytes32' },
  ]},
  primaryType: 'TransferWithAuthorization',
  message: { from: address, to: req.payTo, value: BigInt(req.amount),
             validAfter: BigInt(now - 600), validBefore: BigInt(now + 120), nonce },
});

// 3. Send with Payment-Signature header (base64-encoded JSON payload)
const res = await fetch(API_URL, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json', 'Payment-Signature': paymentB64 },
  body: JSON.stringify({ model: routedModel, messages, max_tokens: 50 }),
});

Uses eth-account for EIP-712 signing. Install: pip install requests eth-account

python

API_URL = 'https://llm402.ai/v1/chat/completions'
body = {'model': 'claude-sonnet-4.6', 'messages': [{'role': 'user', 'content': 'Say hello.'}], 'max_tokens': 50}

# 1. Get 402 -- parse Payment-Required header (x402 v2 envelope)
res402 = requests.post(API_URL, json=body)
envelope = json.loads(base64.b64decode(res402.headers["Payment-Required"]).decode())
req = envelope["accepts"][0]  # always use accepts[0]
routed_model = res402.headers.get("X-Route-Model", "claude-sonnet-4.6")

# 2. Sign EIP-3009 TransferWithAuthorization (off-chain, no gas)
domain = {"name": req["extra"]["name"], "version": req["extra"]["version"],
          "chainId": 8453, "verifyingContract": req["asset"]}
message = {"from": address, "to": req["payTo"], "value": int(req["amount"]),
           "validAfter": now - 600, "validBefore": now + 120,
           "nonce": bytes.fromhex(nonce[2:])}
signable = encode_typed_data(domain, types, "TransferWithAuthorization", message)
signed = account.sign_message(signable)

# 3. Send with Payment-Signature header (base64-encoded JSON payload)
res = requests.post(API_URL, json={**body, "model": routed_model},
                    headers={"Payment-Signature": payment_b64})

Requires an EIP-712 signing tool (e.g., Foundry's cast) for Step 2.

bash

# 1. Get 402 challenge
curl -s -D- -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-sonnet-4.6","messages":[{"role":"user","content":"hi"}],"max_tokens":50}'
# Response: HTTP 402 with Payment-Required header (base64 JSON) and WWW-Authenticate (L402)

# 2. Decode the Payment-Required header (x402 v2 envelope)
echo "$PAYMENT_REQ_HEADER" | base64 -d | jq .
# Returns: { x402Version: 2, accepts: [{ scheme, network, amount, asset, payTo, extra }], resource, price }
# Use accepts[0] for payment details: jq '.accepts[0]'

# 3. Sign EIP-3009 with cast, build payload, base64 encode (see x402 docs above for full flow)

# 4. Send with payment
curl -s -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Payment-Signature: $PAYMENT_B64" \
  -d '{"model":"claude-sonnet-4.6","messages":[{"role":"user","content":"hi"}],"max_tokens":50}'

Uses ethers.js v6 with MetaMask or Coinbase Wallet. No gas for the payer -- just a signing prompt.

javascript

// 1. Connect wallet + switch to Base
const provider = new ethers.BrowserProvider(window.ethereum);
const signer = await provider.getSigner();
await window.ethereum.request({ method: 'wallet_switchEthereumChain', params: [{ chainId: '0x2105' }] });

// 2. Get 402, parse Payment-Required header (x402 v2 envelope)
const res402 = await fetch(API_URL, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body });
const envelope = JSON.parse(atob(res402.headers.get('Payment-Required')));
const req = envelope.accepts[0];  // always use accepts[0]

// 3. Sign EIP-3009 (wallet popup -- no gas, no approval tx)
const signature = await signer.signTypedData(
  { name: req.extra.name, version: req.extra.version, chainId: 8453, verifyingContract: req.asset },
  { TransferWithAuthorization: [/* from, to, value, validAfter, validBefore, nonce */] },
  { from: address, to: req.payTo, value: BigInt(req.amount), validAfter: BigInt(now-600), validBefore: BigInt(now+120), nonce }
);

// 4. Send with Payment-Signature header
const res = await fetch(API_URL, {
  headers: { 'Content-Type': 'application/json', 'Payment-Signature': btoa(JSON.stringify(payload)) },
  method: 'POST', body: JSON.stringify({ model: routedModel, messages, max_tokens: 50 }),
});

Pay with Bitcoin Lightning. Two-step: get invoice, pay, resend with proof.

bash

# Step 1: Get 402 challenge with Lightning invoice
curl -s -D- -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"deepseek-v3.2","messages":[{"role":"user","content":"hi"}],"max_tokens":50}'

# Response body includes:
#   "invoice": "lnbc210n1pn..."   (pay this with your Lightning wallet)
#   "macaroon": "AgEJbGxt..."     (send this back with the preimage)
#   "price": 21                   (cost in sats)

# Step 2: Pay the Lightning invoice with your wallet.
# Your wallet will give you the preimage (64-char hex).

# Step 3: Resend the request with L402 authorization
curl -s -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: L402 AgEJbGxt...:a1b2c3d4e5f67890..." \
  -d '{"model":"deepseek-v3.2","messages":[{"role":"user","content":"hi"}],"max_tokens":50}'

# Response: HTTP 200 with the chat completion

Prepay for a balance, then use it for multiple requests.

bash

# Step 1: Create a prepaid balance (get Lightning invoice)
curl -s -X POST https://llm402.ai/v1/balance \
  -H "Content-Type: application/json" \
  -d '{"sats": 1000}'
# Response: { "payment_hash": "abc...", "invoice": "lnbc10u...", "sats": 1000, "expires_in": 600 }

# Step 2: Pay the invoice, then poll for the token
curl -s -X POST https://llm402.ai/v1/balance \
  -H "Content-Type: application/json" \
  -d '{"payment_hash": "abc..."}'
# Response: { "paid": true, "token": "bal_xxxx...", "sats": 1000, "expires_at": "..." }

# Step 3: Use the token for requests (no per-request payment needed)
curl -s -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer bal_xxxx..." \
  -d '{"model":"gpt-5.4","messages":[{"role":"user","content":"hello"}],"max_tokens":100}'

# Step 4: Check remaining balance
curl -s -X POST https://llm402.ai/v1/balance \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer bal_xxxx..." \
  -d '{"action": "status"}'
# Response: { "sats": 850, "expires_at": "...", "total_spent": 150, "requests": 3 }

Pricing

Prices are computed dynamically per-request based on the model, your estimated input tokens, and your requested max_tokens. Cheaper models cost as little as 21 sats. The exact price is returned in the 402 response.

A uniform 10% markup is applied over upstream provider cost across every modality — chat, embeddings, images, video, and web search. Same markup regardless of which payment rail you use (L402, x402, Cashu, or prepaid balance). The 21-sat floor applies to all sats-denominated requests.

How `max_tokens` affects your bill

We charge upfront, once per request, based on input size plus your max_tokens cap — not on actual tokens returned. A 50-word answer at max_tokens: 16384 costs the same as a 5,000-word answer at the same cap. Sizing the cap tightly is the single biggest lever on your invoice.

Default: If you omit max_tokens, the server uses 2048 — enough for ~1,500 words or ~180 lines of code. Fits the vast majority of chat responses.

Guidance for larger outputs:

Use case	Recommended `max_tokens`	Rough output size
Short chat / factual answer	256 – 512	1–2 paragraphs
Standard reply, default	2048 (omitted)	~1,500 words / ~180 LOC
Long-form explanation, multi-step reasoning	4096 – 8192	~3,000–6,000 words
Essays, full blog posts, long code files	16384	~12,000 words
Book chapters, very long generation	32768+	~24,000+ words

Per-model upper bounds apply — check max_tokens in /v1/models (which returns each model's context_length along with per-token pricing in USD and sats). Requests that exceed a model's context window are rejected before payment with a 400, so you never pay for an impossible request.

Denomination by Protocol

Protocol	Unit	Minimum	Notes
x402 (USDC)	Atomic USDC (6 decimals)	~$0.001	`amount: "3150"` = $0.003150. Native USD -- no BTC conversion.
L402 (Lightning)	Satoshis	21 sats	BTC/USD converted at request time. 21-sat floor for all models.
Cashu (ecash)	Satoshis	21 sats	Same denomination as L402. Send sat-denominated Cashu tokens.
Balance (prepaid)	Satoshis	21 sats	Funded via Lightning or USDC. Deducted per-request in sats.

Price verification: The server recalculates the price on the paid retry and verifies the signed/paid amount covers the minimum. A rounding tolerance of 5 atomic USDC is allowed for x402.

Errors

All errors on /v1/* endpoints follow the OpenAI error format:

json

{
  "error": {
    "message": "description",
    "type": "error_type",
    "code": "error_code"
  }
}

x402-Specific Errors

Code	HTTP	Type	Description
`x402_bad_payload`	400	`invalid_request_error`	Payment-Signature header is not valid base64 or not valid JSON
`x402_underpayment`	402	`payment_error`	Signed amount is less than the model's current price
`x402_settlement_failed`	402	`payment_error`	Payment rejected (bad sig, insufficient balance, expired auth)
`ambiguous_payment`	400	`invalid_request_error`	Request has multiple payment headers (Payment-Signature, Authorization, X-Cashu). Use one, not both.

Cashu-Specific Errors

Code	HTTP	Type	Description
`cashu_no_stream`	400	`invalid_request_error`	Cashu tokens cannot be used with streaming (change requires buffered response)
`cashu_too_many_proofs`	400	`invalid_request_error`	Token contains more than 20 proofs (DoS prevention limit)
`cashu_wrong_unit`	400	`invalid_request_error`	Only sat-denominated Cashu tokens are accepted
`cashu_mint_not_allowed`	400	`invalid_request_error`	Token's mint is not in the server's allowlist
`cashu_underpayment`	402	`payment_error`	Token value is less than the model's price
`cashu_underpayment_after_fees`	402	`payment_error`	Token value is less than model's price after mint swap fees

L402-Specific Errors

Reason	HTTP	Description
Invalid macaroon signature	401	Macaroon was tampered with or signed with wrong key
Macaroon expired	401	`ExpiresAt` caveat exceeded (macaroons valid for 5 min)
Path mismatch	401	Macaroon's `RequestPath` does not match the endpoint called
max_tokens exceeds paid amount	401	Request `max_tokens` exceeds the `MaxTokens` caveat. Get a new invoice.
Input exceeds paid amount	401	Input size grew since invoice was issued (`MaxInputChars` / `MaxInputTokens`)
Invoice expired (server restarted)	401	`NotBefore` caveat fails after container restart. Request a new invoice.
Model mismatch	401	Request model does not match the macaroon's `Model` caveat
Preimage does not match	401	Preimage does not hash to the macaroon's payment hash

General Errors

Code / Reason	HTTP	Description
Rate limit	429	Per-IP rate limit exceeded. Check `Retry-After` header for seconds to wait.
Concurrent stream limit	429	Too many concurrent streams from your IP.
Context window exceeded	400	Input + max_tokens exceeds the model's context window
Invalid model	400	Model name not found in the model catalog
Service unavailable	503	Backend provider temporarily unreachable. Try a different model or retry later.

x402 + concurrent streams: The server checks stream capacity before settling USDC on-chain. If you hit the concurrent stream limit (429), your payment has NOT been settled and you can safely retry.

Rate Limits

Rate limits apply per IP address (via cf-connecting-ip). Limits differ by endpoint class.

Endpoint class	Example paths	Limit
Free endpoints	`/health`, `/v1/models`, `/v1/estimate-cost`, `/api/tags`, `/.well-known/*`	60 requests / minute
Invoice requests (402 challenge)	`POST /v1/balance` (create / top-up)	30 requests / minute
Polling endpoints	`POST /api/invoice/status`	60 requests / minute (free tier — supports ~5s polling cadence)
Authenticated inference	`POST /v1/chat/completions`, `POST /v1/embeddings`, `POST /api/chat/`, `POST /api/generate/`	60 requests / minute
Media generation	`POST /v1/images/generations`, `POST /v1/videos`	10 requests / minute
Video job polling	`GET /v1/videos/{id}`, `GET /v1/videos/{id}/content`	60 requests / minute

Concurrent Stream Limits

Scope	Limit
Per IP	5 concurrent streams
Global	250 concurrent streams

When rate limited, the response includes a Retry-After header indicating how many seconds to wait before retrying.

Models & Auto-Routing

llm402.ai serves 400+ models across multiple providers. The full model list with pricing is available at the /v1/models endpoint:

bash

curl -s https://llm402.ai/v1/models | jq '.data[].id'

Model Naming

You can use either short names or full provider-prefixed IDs:

Short Name	Full ID
`deepseek-v3.2`	`deepseek/deepseek-v3.2`
`claude-sonnet-4.6`	`anthropic/claude-sonnet-4.6`
`gpt-5.4`	`openai/gpt-5.4`

Auto-Routing

Send model: "auto" and an embedding-based classifier picks the best model across 8 task categories: code, reasoning, creative, summarization, multilingual, general_knowledge, agents, chat. Vision is supported via explicit "task": "vision" in the request body (the auto-classifier itself doesn't infer vision — provide images and set the task hint).

code -- programming, debugging, code generation
reasoning -- logic, math, step-by-step analysis
general_knowledge -- factual questions, definitions, Q&A
creative -- writing, storytelling, brainstorming
summarization -- condensing content, TL;DR
chat -- casual conversation, general chat
multilingual -- translation, cross-language tasks
agents -- function calling, tool integration, structured output
vision -- image understanding (multimodal models)

To skip the classifier and route within a specific category, use the task body parameter with model: "auto":

bash

curl -s -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"auto","task":"code","messages":[{"role":"user","content":"Sort a list in Python"}],"max_tokens":200}'

Important (x402 / Cashu): Echo the X-Route-Model from the 402 response into the body on the paid retry. Don't re-send "auto" — the router could pick a different model at a different price. L402 retries can keep "auto" because the macaroon's Model caveat is the binding authority.