llm402.ai API

Pay-per-request LLM inference. No accounts. No API keys. Just pay and prompt.

llm402.ai provides OpenAI-compatible endpoints gated by HTTP 402 micropayments. Send a request, get an invoice, pay it, re-send with proof of payment. Your prompt is processed by one of 400+ models across multiple providers.

Four payment rails are supported. Every 402 response includes all available options -- pick whichever works for your client:

ProtocolCurrencyNetworkHeader
L402 Bitcoin (sats) Lightning Network WWW-Authenticate
x402 USDC (stablecoin) Base L2 (EIP-3009) Payment-Required
Cashu Bitcoin (sats) Ecash tokens X-Cashu
Balance Bitcoin (sats) or USDC (stablecoin) Prepaid account — fund via Lightning or x402 Authorization: Bearer

Model Naming

Short names work for all models -- no provider prefix needed:

  • deepseek-v3.2, claude-sonnet-4.6, gpt-5.4
  • Full IDs also work: deepseek/deepseek-v3.2, anthropic/claude-sonnet-4.6
  • For auto-routing: use "model": "auto" and read X-Route-Model header from the 402 response

Model in URL Path

All inference endpoints support specifying the model in the URL path instead of the request body:

  • /v1/chat/completions/deepseek-v3.2
  • /v1/images/generations/FLUX.1-schnell
  • /v1/videos/generations/kling-2.1-master

If both URL path and body contain a model, the body model takes priority. The /v1/models endpoint returns all available model IDs.

Quick Start

x402 (USDC on Base)

Pay with USDC stablecoins. No BTC needed. No gas for the payer.

1. POST /v1/chat/completions returns 402. Server responds with a Payment-Required header (base64 JSON: price, payTo, EIP-712 domain).
2. Decode the header and sign an EIP-3009 TransferWithAuthorization with your wallet (off-chain signature -- no gas, no approval tx).
3. POST /v1/chat/completions with the Payment-Signature header returns 200. Server settles USDC on-chain, then returns inference.

L402 (Bitcoin Lightning)

Pay with Bitcoin over the Lightning Network. Instant settlement, 21-sat minimum.

1. POST /v1/chat/completions returns 402. Server responds with a WWW-Authenticate header (macaroon + Lightning invoice).
2. Pay the Lightning invoice, receive the preimage.
3. POST /v1/chat/completions with the Authorization: L402 header returns 200. Format: Authorization: L402 {macaroon}:{preimage}.

Endpoints

All inference endpoints are OpenAI-compatible. Base URL: https://llm402.ai

OpenAI-Compatible

MethodPathDescriptionAuth
POST /v1/chat/completions
/v1/chat/completions/{model}
Chat completions (streaming + buffered). Model can be in URL path or request body. L402 / x402 / Balance / Cashu
POST /v1/embeddings Text embeddings (max 128 strings per batch, no streaming) L402 / x402 / Balance / Cashu
POST /v1/images/generations
/v1/images/generations/{model}
Image generation (synchronous, one image per request). Model can be in URL path or request body. L402 / x402 / Balance / Cashu
POST /v1/videos
/v1/videos/generations/{model}
Create a video generation job (async, returns job ID). Model can be in URL path or request body. L402 / x402 / Balance / Cashu
GET /v1/videos/{job_id} Poll video job status (no auth required) None
GET /v1/videos/{job_id}/content Stream finished video MP4 through llm402 proxy (provider URL never exposed). Returned as video_url on completed poll responses. None
POST /v1/balance Prepaid balance: create, top up, check status None / Balance
GET /v1/models List all available models (OpenAI-compatible) None (free)

Ollama-Compatible

MethodPathDescriptionAuth
POST /api/generate/{model} Text generation L402 / x402 / Balance / Cashu
POST /api/chat/{model} Chat L402 / x402 / Balance / Cashu
GET /api/tags Model catalog with pricing None (free)

Ollama Examples

bash
# Chat via Ollama-compatible endpoint (model in path)
curl -s -X POST https://llm402.ai/api/chat/deepseek-v3.2 \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"hello"}],"stream":false}'
bash
# Text generation via Ollama-compatible endpoint
curl -s -X POST https://llm402.ai/api/generate/deepseek-v3.2 \
  -H "Content-Type: application/json" \
  -d '{"prompt":"Explain Bitcoin in one sentence","stream":false}'
bash
# List all models with pricing and endpoints
curl -s https://llm402.ai/api/tags | jq '.models[] | {name, price_sats}'

Utility

MethodPathDescriptionAuth
GET /health Service health and status None (free)
POST /v1/estimate-cost Pre-authorization cost estimation None (free)
POST /api/invoice/status Poll Lightning invoice payment status None (free)
GET /.well-known/l402 L402 service discovery (agent-readable) None (free)
GET /.well-known/openapi.json OpenAPI 3.1.0 specification None (free)
GET /.well-known/x402-discovery.json x402 v2 Bazaar discovery (resource catalog with route schemas + prices) None (free)

Estimate Cost

Pre-authorize requests by checking the cost before paying. This endpoint is free and requires no authentication. Useful for MCP clients, agents, and budgeting.

bash
curl -s -X POST https://llm402.ai/v1/estimate-cost \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "Explain quantum computing"}],
    "max_tokens": 500,
    "pref": "balanced"
  }'

Response:

json
{
  "model": "deepseek-v3.2",
  "shortName": "deepseek-v3.2",
  "category": "general_knowledge",
  "confidence": 0.82,
  "rc": 100,
  "estimatedInputTokens": 8,
  "estimatedOutputTokens": 500,
  "costSats": 21,
  "costUsd": 0.000152,
  "btcPrice": 68000
}

Parameters

FieldRequiredDescription
messagesYesArray of message objects (same format as chat completions)
modelNoModel name (short or full ID). If omitted or "auto", the server auto-routes.
max_tokensNoUpper bound on output tokens. When omitted, the server applies a per-model default configured server-side; the global fallback is 2048 if no per-model default is set. Use POST /v1/estimate-cost to see the exact estimatedOutputTokens the server will use for your model. You are billed on this cap, not on actual consumption — set it tight for short replies, bump it up (8192, 16384, 32768+) for long-form generation. See Pricing.
prefNoRouting preference: quality, balanced, cost, speed
max_costNoMaximum cost in sats (routes only to models within budget)

Response fields

FieldDescription
modelResolved model ID (full upstream form, e.g. deepseek/deepseek-v3.2)
shortNameShort model alias (e.g. deepseek-v3.2) — same form accepted in URL paths
categoryAuto-routing category the prompt classified into (e.g. code, reasoning, general_knowledge)
confidenceClassifier confidence (0–1) for the chosen category
rcRouting complexity tier (10–100): higher = more capable model required for the prompt
estimatedInputTokensEstimated input tokens (used for billing; capped from prompt length)
estimatedOutputTokensEstimated output tokens (taken from max_tokens cap)
costSatsEstimated invoice price in sats (will match the 402 challenge if sent now)
costUsdSame estimate in USD (informational)
btcPriceCurrent BTC price used for the conversion (refreshes per model-sync cycle)
webSearchEnabledBoolean — true if the request specified web_search: true and the surcharge is included

Invoice Status

Poll the payment status of a Lightning invoice. Useful for wallet integrations that need to confirm payment before re-sending with the L402 header.

bash
curl -s -X POST https://llm402.ai/api/invoice/status \
  -H "Content-Type: application/json" \
  -d '{
    "payment_hash": "a1b2c3d4...64hex",
    "macaroon": "AgEJ..."
  }'

# Before payment: { "paid": false }
# After payment:  { "paid": true, "preimage": "e5f6a7b8...64hex" }

Security: The macaroon field is required and must match the payment_hash. This prevents preimage theft by ensuring only the original invoice requester can poll for the preimage.

x402 Protocol (USDC)

x402 uses EIP-3009 TransferWithAuthorization for gasless USDC payments on Base. The payer signs an off-chain authorization; the server settles it on-chain.

Network and Asset

FieldValue
Networkeip155:8453 (Base mainnet, chain ID 8453)
AssetUSDC 0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913
DenominationAtomic USDC (6 decimals: 1000000 = $1.00)

Payment Flow

1. Get the 402 challenge

Send a normal inference request with no auth headers:

bash
curl -s -D- -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4.6",
    "messages": [{"role": "user", "content": "Say hello."}],
    "max_tokens": 50
  }'

The server responds with HTTP 402. The response body contains all payment information:

json
{
  "error": "Payment Required",
  "description": "claude-sonnet-4.6 inference, pay-per-request over Lightning, USDC, or Cashu",
  "price": 42,
  "model": "claude-sonnet-4.6",
  "provider": "llm402.ai",
  "max_tokens": 50,
  "estimated_input_tokens": 12,
  "invoice": "lnbc420n...",
  "macaroon": "AgEJ...",
  "paymentHash": "a1b2c3d4e5f6...64hex",
  "x402": {
    "price_usd": "0.000305",
    "network": "eip155:8453",
    "address": "0xe05cf38...",
    "asset": "0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913",
    "scheme": "exact"
  },
  "cashu": {
    "price_sats": 42,
    "unit": "sat",
    "description": "Send sat-denominated Cashu tokens in X-Cashu header. Server-configured mint allowlist — see /llms.txt for the current list."
  }
}

The response headers include all payment options:

http
HTTP/2 402
WWW-Authenticate: L402 macaroon="...", invoice="lnbc..."
Payment-Required: eyJzY2hlbWUiOiJleGFjdCIsIm5ldH...
Cache-Control: no-store

Payment-Required Header

Base64-encoded JSON in x402 v2 envelope format. Decode it and use accepts[0] for payment details:

json
{
  "x402Version": 2,
  "error": "Payment required",
  "accepts": [
    {
      "scheme": "exact",
      "network": "eip155:8453",
      "amount": "3150",
      "asset": "0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913",
      "payTo": "0x...",
      "maxTimeoutSeconds": 120,
      "extra": {
        "name": "USD Coin",
        "version": "2"
      }
    }
  ],
  "resource": {
    "url": "https://llm402.ai/v1/chat/completions",
    "description": "LLM inference",
    "mimeType": "application/json"
  },
  "price": "$0.003150"
}
FieldDescription
x402VersionAlways 2
acceptsArray of payment options. Always use accepts[0]
accepts[0].schemeAlways "exact"
accepts[0].networkAlways "eip155:8453" (Base mainnet)
accepts[0].amountPrice in atomic USDC (6 decimals). "3150" = $0.003150
accepts[0].assetUSDC contract address on Base
accepts[0].payToServer's wallet address (recipient)
accepts[0].maxTimeoutSecondsMaximum settlement time (120s)
accepts[0].extra.nameEIP-712 domain name. Always "USD Coin" (not "USDC" -- that is testnet)
accepts[0].extra.versionEIP-712 domain version. Always "2"
priceHuman-readable USD price (informational only, use accepts[0].amount for signing)
extensions.bazaarOptional. x402 Bazaar discovery metadata (route schema, input/output examples). Forward unmodified in your payment payload — spec-compliant clients should pass it through.
2. Sign the EIP-3009 authorization

Build a TransferWithAuthorization signature using EIP-712 typed data.

EIP-712 Domain

javascript
const domain = {
  name: "USD Coin",        // from extra.name
  version: "2",            // from extra.version
  chainId: 8453,           // Base mainnet
  verifyingContract: "0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913"
};

EIP-712 Types

javascript
const types = {
  TransferWithAuthorization: [
    { name: "from",        type: "address" },
    { name: "to",          type: "address" },
    { name: "value",       type: "uint256" },
    { name: "validAfter",  type: "uint256" },
    { name: "validBefore", type: "uint256" },
    { name: "nonce",       type: "bytes32" },
  ]
};

Authorization Message

javascript
const now = Math.floor(Date.now() / 1000);
const nonce = "0x" + crypto.randomBytes(32).toString("hex");
const opt = paymentRequired.accepts[0];  // always use accepts[0]

const message = {
  from:        walletAddress,             // your address (payer)
  to:          opt.payTo,                 // from accepts[0]
  value:       BigInt(opt.amount),
  validAfter:  BigInt(now - 600),         // 10 min ago (clock skew buffer)
  validBefore: BigInt(now + 120),         // 2 min from now
  nonce:       nonce,
};
3. Build the payment payload

Construct the V2 payment payload and base64-encode it:

javascript
const opt = paymentRequired.accepts[0];  // always use accepts[0]
const payload = {
  x402Version: 2,
  resource: paymentRequired.resource,
  accepted: {
    scheme: opt.scheme,
    network: opt.network,
    amount: opt.amount,
    asset: opt.asset,
    payTo: opt.payTo,
    maxTimeoutSeconds: opt.maxTimeoutSeconds,
    extra: opt.extra
  },
  payload: {
    signature: signature,
    authorization: {
      from: walletAddress,
      to: opt.payTo,
      value: opt.amount,
      validAfter: (now - 600).toString(),
      validBefore: (now + 120).toString(),
      nonce: nonce
    }
  }
};

const paymentSignature = Buffer.from(JSON.stringify(payload)).toString("base64");
4. Send the paid request

Re-send the same inference request with the Payment-Signature header:

bash
curl -s -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Payment-Signature: eyJ4NDAyVmVyc2lvbiI6Mix..." \
  -d '{
    "model": "claude-sonnet-4.6",
    "messages": [{"role": "user", "content": "Say hello."}],
    "max_tokens": 50
  }'

Response (HTTP 200):

json
{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1712345678,
  "model": "claude-sonnet-4.6",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Hello! How can I help you today?"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 4,
    "completion_tokens": 10,
    "total_tokens": 14
  }
}

Auto-routing gotcha (x402 / Cashu): If you used model: "auto" on the 402 challenge, the server routed to a specific model and returned it in the X-Route-Model response header. On the x402 / Cashu retry with payment, you must echo that specific model back in the body — not "auto" again — because x402 / Cashu have no server-side memory of the original routing decision. (L402 differs: the macaroon binds the routed model in its Model caveat, so the L402 retry can keep "auto" in the body.)

CORS

The server allows cross-origin x402 requests:

http
Access-Control-Allow-Headers: Content-Type, Authorization, Payment-Signature, X-Cashu, Mcp-Session-Id
Access-Control-Expose-Headers: X-Route-Model, X-Route-Category, Payment-Required, WWW-Authenticate, X-Cashu-Change

Nonce Replay Protection

Each signed authorization can only be used once. Replay protection is enforced both server-side and on-chain via EIP-3009 nonces.

L402 Protocol (Lightning)

L402 (formerly LSAT) combines HTTP 402 status codes with Lightning Network payments and macaroon-based authentication. It is the original payment protocol supported by llm402.ai.

Payment Flow

1. Client sends POST /v1/chat/completions with no auth.
2. Server returns 402 with WWW-Authenticate: L402 macaroon="...", invoice="lnbc...".
3. Client pays the Lightning invoice and obtains the preimage.
4. Client re-sends the request with Authorization: L402 {macaroon}:{preimage}.
5. Server verifies macaroon + preimage, proxies to inference, and returns 200.

Curl Example

bash
# Send a request with no auth -- get back a 402 with an invoice
curl -s -D- -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-v3.2",
    "messages": [{"role": "user", "content": "hi"}],
    "max_tokens": 50
  }'

# Response includes:
#   WWW-Authenticate: L402 macaroon="AgEJ...", invoice="lnbc210n1pn..."
#   Body: { "error": "Payment Required", "price": 21, "invoice": "lnbc...", "macaroon": "AgEJ...", ... }
bash
# Pay the Lightning invoice with your wallet and get the preimage.
# Then resend the exact same request with the L402 Authorization header:

curl -s -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: L402 AgEJbGxtNDAyLmFp...:a1b2c3d4e5f67890..." \
  -d '{
    "model": "deepseek-v3.2",
    "messages": [{"role": "user", "content": "hi"}],
    "max_tokens": 50
  }'

# Response: HTTP 200 with chat completion

WWW-Authenticate Header

The 402 response includes a WWW-Authenticate header with two components:

http
WWW-Authenticate: L402 macaroon="AgELbGxt...", invoice="lnbc50n1pn..."
ComponentDescription
macaroonBase64-encoded V2 TLV macaroon with embedded caveats. Bound to a specific payment hash.
invoiceBOLT-11 Lightning invoice. Pay this to obtain the preimage.

Macaroon Caveats

Each macaroon is bound with first-party caveats that restrict its use. The server verifies all caveats on the paid request and rejects any that fail (fail-closed):

CaveatFormatDescription
RequestPathRequestPath = /v1/chat/completionsRestricts the macaroon to a specific API endpoint
ExpiresAtExpiresAt = 1712345678Unix timestamp expiry (5 minutes from issuance)
MaxTokensMaxTokens = 256Maximum output tokens the request may use
MaxInputCharsMaxInputChars = 1500Prevents input inflation after invoice issuance
MaxInputTokensMaxInputTokens = 400Prevents token-count gaming (chars pass but tokens are higher)
NotBeforeNotBefore = 1712340000Prevents preimage replay after server restart
MaxInputItemsMaxInputItems = 5Binds the macaroon to the actual batch size from the request (example shows 5 items; max accepted is 128 per /v1/embeddings batch). No tolerance — retry must use the same item count.
ModelModel = claude-sonnet-4.6Binds the macaroon to a specific model (prevents cross-model bypass)
MediaTypeMediaType = imageEmitted for /v1/images/generations and /v1/videos. Restricts a media macaroon to a specific media class (image or video).
MaxUnitsMaxUnits = 1Number of output units the macaroon covers (e.g. images or videos). Always 1 on media endpoints.
MaxDurationMaxDuration = 8Maximum video duration in seconds (typical range 1–10). Emitted ONLY when the request specifies seconds — binds the macaroon to that duration to prevent post-invoice upsell. Default-discovery 402 challenges (no seconds) omit this caveat; the per-model duration cap from /v1/models capabilities.durations applies instead. Video only.
MaxDimensionMaxDimension = 1920Maximum video longest-side pixels. Emitted ONLY when the request specifies width/height — binds the macaroon to that resolution to prevent post-invoice upsell. Default-discovery 402 challenges omit this caveat. Video only.
WebSearchWebSearch = trueAdded to chat macaroons when the original request sent web_search: true. Binds the paid search surcharge to the flag.

Fail-closed design: Unrecognized caveats are rejected. This ensures future caveat additions don't accidentally pass on old server versions.

Authorization Header

After paying the Lightning invoice and receiving the preimage, send the authorization:

http
Authorization: L402 AgELbGxt...:abc123def456...

Format: L402 {base64_macaroon}:{hex_preimage}

Tip: on retry you can send "model": "auto" in the body — the server extracts the routed model from the macaroon's Model caveat, so you can reuse the exact same body from 402 discovery without echoing the routed model back.

The server verifies:

  • The macaroon signature against the root key
  • All caveats pass (path, expiry, tokens, model, etc.)
  • The preimage hashes to the payment hash embedded in the macaroon identifier
  • The preimage has not been used before (atomic Redis SET NX, burned before inference begins)

Preimages are single-use and non-refundable on L402. The server claims the preimage atomically before calling the upstream model. If inference then fails (502, timeout, etc.), the preimage is already spent and cannot be retried. This is intentional — burning the preimage after inference would open a replay window where a concurrent request could reuse it. If you need automatic refund-on-failure semantics, use balance tokens or pay with Cashu instead.

Balance Tokens (Prepaid)

Balance tokens let you prepay for multiple requests with a single Lightning payment or USDC transfer. Fund a balance once, then use Authorization: Bearer bal_... on any gated endpoint without per-request payment flows.

How It Works

1. POST /v1/balance with { "sats": 1000 } returns 402 and a Lightning invoice.
2. Pay the Lightning invoice with your wallet.
3. POST /v1/balance with { "payment_hash": "hex64" } returns 200 and { "paid": true, "token": "bal_...", "sats": 1000 }.
4. Use Authorization: Bearer bal_... on any endpoint.

Endpoints

1. Create balance (Lightning)

Request a Lightning invoice to fund a new balance:

bash
# Step 1: Request an invoice for 1000 sats
curl -s -X POST https://llm402.ai/v1/balance \
  -H "Content-Type: application/json" \
  -d '{"sats": 1000}'

# Response (402):
# { "payment_hash": "a1b2c3...", "invoice": "lnbc10u...", "sats": 1000, "expires_in": 600 }
2. Poll for payment

After paying the invoice, poll with the payment hash to get your token. (An unknown payment_hash returns 404 Unknown payment_hash; pending invoices return 200 {"paid": false}.)

bash
# Step 2: Poll until paid
curl -s -X POST https://llm402.ai/v1/balance \
  -H "Content-Type: application/json" \
  -d '{"payment_hash": "a1b2c3d4e5f6...64hex"}'

# Before payment: { "paid": false }
# After payment:  { "paid": true, "token": "bal_xxxx...", "sats": 1000, "expires_at": "2026-05-03T..." }
3. Check balance
bash
curl -s -X POST https://llm402.ai/v1/balance \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer bal_xxxx..." \
  -d '{"action": "status"}'

# Response:
# { "sats": 850, "expires_at": "2026-05-03T...", "total_spent": 150, "requests": 7 }
4. Top up (Lightning)

Add sats to an existing balance:

bash
# Get a top-up invoice
curl -s -X POST https://llm402.ai/v1/balance \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer bal_xxxx..." \
  -d '{"sats": 500}'

# Returns 402 with a new invoice. Pay it, then poll with payment_hash as above.

Fund with USDC (x402)

Fund a balance using USDC on Base instead of Lightning. Send a single POST with a Payment-Signature header carrying an EIP-3009 TransferWithAuthorization envelope — no 402 challenge flow is required from this endpoint. The server settles the USDC on-chain via the CDP facilitator, derives the sats to credit from the signed amount at its current BTC price, and returns the balance token in one round trip.

1. Pick a USDC amount to fund

Decide how many sats you want to buy and convert to USDC atomic units (6 decimals) at a BTC price you are willing to pay. The server will re-derive sats from the signed USDC amount at its own BTC price — your body.sats hint is advisory and never authoritative.

2. Sign an EIP-3009 TransferWithAuthorization

Sign an off-chain authorization to transfer USDC on Base from your wallet to the server’s receiving address. The signing domain, asset, network, and payTo are identical to the values served in Payment-Required envelopes on other llm402 endpoints.

javascript
// Illustrative. Uses viem. Install: npm install viem
import { createWalletClient, http, parseSignature } from 'viem';
import { base } from 'viem/chains';
import { privateKeyToAccount } from 'viem/accounts';

const account = privateKeyToAccount(process.env.PRIV_KEY);
const client = createWalletClient({ account, chain: base, transport: http() });

// USDC on Base, 6 decimals. Example: $0.50 => 500000 atomic units.
const amountAtomic = '500000';
const payTo = '0x...';                     // llm402 receiving address (from any x402 envelope)
const usdc = '0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913';

// Random 32-byte nonce (hex). Must not be reused.
const nonce = '0x' + crypto.randomBytes(32).toString('hex');
const validAfter = 0n;
const validBefore = BigInt(Math.floor(Date.now() / 1000) + 120);

const signature = await client.signTypedData({
  domain: { name: 'USD Coin', version: '2', chainId: 8453, verifyingContract: usdc },
  types: {
    TransferWithAuthorization: [
      { name: 'from', type: 'address' },
      { name: 'to', type: 'address' },
      { name: 'value', type: 'uint256' },
      { name: 'validAfter', type: 'uint256' },
      { name: 'validBefore', type: 'uint256' },
      { name: 'nonce', type: 'bytes32' },
    ],
  },
  primaryType: 'TransferWithAuthorization',
  message: {
    from: account.address, to: payTo, value: BigInt(amountAtomic),
    validAfter, validBefore, nonce,
  },
});

// x402 v2 envelope — base64(JSON) for the Payment-Signature header
const envelope = {
  x402Version: 2,
  scheme: 'exact',
  network: 'eip155:8453',
  payload: {
    signature,
    authorization: {
      from: account.address, to: payTo, value: amountAtomic,
      validAfter: validAfter.toString(), validBefore: validBefore.toString(),
      nonce,
    },
  },
};
const paymentSignature = Buffer.from(JSON.stringify(envelope)).toString('base64');
3. POST to /v1/balance with the signed envelope

The body may be omitted entirely, or may contain {"sats": N} as a hint for your own UX. The server ignores body.sats for accounting and derives sats from the signed USDC amount.

bash
curl -s -X POST https://llm402.ai/v1/balance \
  -H "Content-Type: application/json" \
  -H "Payment-Signature: $PAYMENT_SIG" \
  -d '{"sats": 500}'

# 200 OK on success:
# { "paid": true, "token": "bal_xxxx...", "sats": 500, "credited": 500 }

To top up an existing balance, add Authorization: Bearer bal_... to the same request. The server caps the top-up at 50000 sats total and returns 400 if the signed amount would exceed the cap.

Server derives sats, not client: the credited sats are computed from the signed USDC atomic amount at the server’s current BTC price, not from body.sats. If BTC moves between the moment you decide an amount and the moment the server settles, your credited sats may not match your body.sats hint. Always trust the sats field in the 200 response, not the request.

Use the balance token

Once you have a bal_ token (from either the Lightning flow above or the USDC funding flow), include it as a Bearer auth on any gated endpoint:

bash
curl -s -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer bal_xxxx..." \
  -d '{
    "model": "deepseek-v3.2",
    "messages": [{"role": "user", "content": "Explain Lightning Network"}],
    "max_tokens": 200
  }'

Token Lifecycle

RuleValue
Inactivity TTL30 days (resets on each use)
Max lifetime90 days from creation
Max balance50,000 sats
Min deposit100 sats

Top-ups reset the inactivity timer but do not extend the 90-day max lifetime. Plan deposits accordingly.

Cashu Tokens (Ecash)

Pay with Cashu ecash tokens -- instant, private Bitcoin micropayments with no Lightning channel required. Send tokens directly in the request header. If you overpay, the server returns change tokens.

How It Works

1. POST /v1/chat/completions (no auth) returns 402. The response body includes cashu.price_sats.
2. POST /v1/chat/completions with an X-Cashu header returns 200. The server swaps tokens at the mint, runs inference, and returns change if overpaid.

Request

Send a cashuB (v4) token in the X-Cashu header:

bash
curl -s -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-Cashu: cashuBo2F0gaJha..." \
  -d '{
    "model": "deepseek-v3.2",
    "messages": [{"role": "user", "content": "hello"}],
    "max_tokens": 100
  }'

Response Headers

Every successful Cashu-paid call emits the same pair of response headers, and the server always lists them in Access-Control-Expose-Headers so browser clients can read them:

HeaderValueMeaning
X-Cashu-Consumed true | refunded Status flag — true means the server swapped the proofs at the mint and consumed payment; refunded means the request failed after swap and the full amount was returned in X-Cashu-Change.
X-Cashu-Change cashuB... token Emitted when the presented token exceeded the price by at least 2 sats (MIN_CHANGE_SATS). Smaller overpayments are absorbed. The change token is capped at 8 KB; if the split would produce a larger header, change is absorbed.

These headers appear on every endpoint that accepts X-Cashu payment — /v1/chat/completions, /v1/embeddings, /v1/images/generations, and /v1/videos.

http
HTTP/2 200
Content-Type: application/json
X-Cashu-Consumed: true
X-Cashu-Change: cashuBo2F0gaJha...
Access-Control-Expose-Headers: X-Cashu-Consumed, X-Cashu-Change

The change token is a standard cashuB token. Import it with any Cashu wallet or present it on a subsequent request.

Import change or lose it: X-Cashu-Change carries real bearer money. Wallet clients MUST read the header and import the proofs on every successful response. Discarding the header is equivalent to burning the overpayment — the server does not retain a copy.

Constraints

RuleValue
Token formatcashuB (v4) only. Deprecated cashuA (v3) tokens are rejected.
UnitSat-denominated only (no USD or other units)
Max proofs20 per token (DoS prevention)
StreamingNot supported. Cashu requires buffered responses to calculate change. Use "stream": false.
Change threshold2 sats minimum. Overpayment of 1 sat is absorbed (not worth the mint round-trip).
Change size limit8 KB. If the change token exceeds 8 KB, it is absorbed by the server.
MintServer-configured allowlist. HTTPS-only, no private IPs. The 402 response body's cashu.description field indicates the current policy.

No 402 dance needed: Unlike L402, you can skip the initial 402 request if you already know the price — just send the Cashu token directly and the server verifies the token value covers the model’s price. (x402 has a similar shortcut on POST /v1/balance for funding a balance token, but inference endpoints still expect either a prior 402 or a known price.)

MCP Server

llm402.ai provides a hosted Model Context Protocol (MCP) server. Connect from any MCP client — Claude Code, Claude Desktop, Cursor, or any tool that supports MCP. Six tools are available: text inference, image generation, video generation, model discovery, balance management, and funding.

Setup

1. Get a balance token. Visit llm402.ai/chat and fund a balance with Lightning or USDC. Copy your bal_ token from the balance display.
2. Add to your MCP client config.

Claude Code (~/.claude.json):

json
{
  "mcpServers": {
    "llm402": {
      "url": "https://llm402.ai/mcp",
      "headers": {
        "Authorization": "Bearer bal_YOUR_TOKEN_HERE"
      }
    }
  }
}

Claude Desktop (claude_desktop_config.json) — Claude Desktop only supports stdio MCP transport, so use the third-party mcp-remote bridge:

json
{
  "mcpServers": {
    "llm402": {
      "command": "npx",
      "args": [
        "mcp-remote",
        "https://llm402.ai/mcp",
        "--header",
        "Authorization:Bearer bal_YOUR_TOKEN_HERE"
      ]
    }
  }
}

Replace bal_YOUR_TOKEN_HERE with your actual balance token. That’s it — the MCP client discovers all tools automatically. (mcp-remote is a community npm package; pin the version with [email protected] in production.)

Available Tools

ToolAuthDescription
llm402_inference Required Text inference. 400+ models, auto-routed by default. Supports system prompts, model selection, temperature, max_tokens, and routing preference (quality/balanced/cost/speed).
llm402_image Required Image generation. Requires a specific model ID (e.g. black-forest-labs/FLUX.1-schnell). Supports width, height, steps, seed, negative prompt.
llm402_video Required Video generation (async). Requires a specific model ID (e.g. wan2.7-t2v). Supports seconds, width, height, fps. Polls for completion up to 90s, then returns job URL for manual polling.
llm402_models None List available models. Optional substring filter (e.g. "deepseek", "flux"). Free, no balance required.
llm402_balance Required Check your prepaid balance: remaining sats, total deposited, total spent, request count.
llm402_fund Required Generate a Lightning invoice to top up your balance. Default 5,000 sats. Polls for payment confirmation up to 45 seconds.

Example: Text Inference

bash
# Using curl against the MCP endpoint directly (JSON-RPC format)
curl -X POST https://llm402.ai/mcp \
  -H "Content-Type: application/json" \
  -H "Accept: application/json, text/event-stream" \
  -H "Authorization: Bearer bal_YOUR_TOKEN" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
      "name": "llm402_inference",
      "arguments": {
        "prompt": "Explain quantum computing in one sentence.",
        "max_tokens": 100
      }
    },
    "id": 1
  }'

Example: Image Generation

bash
curl -X POST https://llm402.ai/mcp \
  -H "Content-Type: application/json" \
  -H "Accept: application/json, text/event-stream" \
  -H "Authorization: Bearer bal_YOUR_TOKEN" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
      "name": "llm402_image",
      "arguments": {
        "prompt": "A cyberpunk cityscape at night with neon lights",
        "model": "black-forest-labs/FLUX.1-schnell"
      }
    },
    "id": 1
  }'

OpenAI-Compatible Alternative

If your tool doesn't support MCP but accepts OPENAI_BASE_URL, the same balance token works directly:

bash
export OPENAI_BASE_URL=https://llm402.ai/v1
export OPENAI_API_KEY=bal_YOUR_TOKEN

This works with Cursor, Aider, LangChain, the OpenAI Python SDK, and any tool that accepts a custom base URL.

Endpoint

MCP endpoint: https://llm402.ai/mcp

Protocol: Streamable HTTP (POST only). Each request is independently authenticated via the Authorization: Bearer bal_ header — no server-side session state is persisted between calls. (The Mcp-Session-Id CORS header is allowed for spec-compliant clients that send it, but the server doesn’t require or track it.) Responses are Server-Sent Events containing JSON-RPC results.

OpenClaw Plugin

Use llm402.ai from inside OpenClaw via the official @llm402/openclaw-provider plugin. All four payment rails supported: prepaid balance, Cashu ecash, USDC on Base, and Lightning. 400+ models.

1. Install
bash
npm install @llm402/openclaw-provider

Requires Node.js 22+. Includes a wallet CLI and the OpenClaw plugin.

2. Create a wallet (for Cashu / Lightning modes)

The package ships a CLI for wallet management. Skip this step if you only need balance mode (Bearer token).

bash
# Create a wallet (generates Nostr nsec + EVM keypair)
npx llm402-openclaw init

# Fund it with Lightning (prints a BOLT11 invoice — pay from any wallet)
npx llm402-openclaw fund 5000

# Check balance
npx llm402-openclaw balance

Your sats are stored locally as Cashu ecash proofs at ~/.llm402/wallet.json. From your perspective: you pay a Lightning invoice, your inference calls deduct from the Cashu balance. Run npx llm402-openclaw --help for all commands.

3. Configure in OpenClaw settings

Pick a payment mode:

Balance mode (simplest — Bearer token, zero latency, no wallet):

json
{
  "paymentMode": "balance",
  "balanceToken": "bal_YOUR_TOKEN_HERE"
}

Cashu mode (pay with ecash — requires wallet from step 2):

bash
# Reveal your nsec for the config below
LLM402_SHOW_SECRETS=1 npx llm402-openclaw init
json
{
  "paymentMode": "cashu",
  "cashuNsec": "nsec1..."
}

x402 mode (pay with USDC on Base — gasless, no ETH needed):

json
{
  "paymentMode": "x402",
  "evmPrivateKey": "0x..."
}

Lightning mode (pay L402 invoices by melting Cashu proofs):

json
{
  "paymentMode": "lightning",
  "cashuNsec": "nsec1..."
}

Wallet modes (cashu / x402 / lightning) start a local HTTP proxy on 127.0.0.1 that transparently handles the 402-and-pay cycle. OpenClaw only ever sees the final 200 response. See the plugin README for all modes.

CLI commands

CommandDescription
npx llm402-openclaw initCreate or load a wallet
npx llm402-openclaw fund <sats>Get a Lightning invoice, pay, mint Cashu proofs
npx llm402-openclaw balanceShow Cashu balance (+ optional USDC with --check-usdc)
npx llm402-openclaw check-fundingResolve pending quotes from prior fund timeouts
npx llm402-openclaw syncPull wallet state from Nostr relays (opt-in)

Budget controls

Runaway cost protection. Both sats and USDC are tracked independently; either rail can reject a request before signing.

FieldDefaultMaxRail
maxRequestBudgetSats50050,000sats
sessionBudgetSats10,0001,000,000sats
sessionBudgetUsdcCents5,000500,000USDC

Security

This plugin runs locally and handles wallet keys. Do not install on shared systems, CI runners, or Codespaces. Wallet lives at ~/.llm402/wallet.json with 0600 permissions. Full threat model in the SECURITY.md.

Streaming

Add "stream": true to your request body to receive Server-Sent Events (SSE) as tokens are generated. The format follows the OpenAI streaming specification.

Request

bash
curl -s -N -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer bal_xxxx..." \
  -d '{
    "model": "deepseek-v3.2",
    "messages": [{"role": "user", "content": "Count to 5"}],
    "max_tokens": 100,
    "stream": true
  }'

Response Format

The server sends a series of data: lines. Each line is a JSON chunk with a delta object containing the next token(s):

text/event-stream
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1712345678,"model":"deepseek-v3.2","choices":[{"index":0,"delta":{"role":"assistant","content":"1"},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1712345678,"model":"deepseek-v3.2","choices":[{"index":0,"delta":{"content":", "},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1712345678,"model":"deepseek-v3.2","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":24,"total_tokens":36}}

data: [DONE]
FieldDescription
choices[0].delta.roleSent on the first chunk ("assistant"). Upstream providers may bundle the first token alongside the role in the same chunk — code defensively for both shapes.
choices[0].delta.contentThe next token(s) of the response
choices[0].finish_reasonnull while generating, "stop" on the final chunk
usageOptional. Some providers attach a token-count summary to the final chunk (the one with finish_reason). Treat as best-effort.
data: [DONE]End-of-stream marker. Close the connection after this line.

Heartbeat: During long-running inferences, the server sends : heartbeat SSE comments every 15 seconds to keep the connection alive. These are not data lines and should be ignored by your parser.

Consuming the Stream

bash
# Stream tokens to stdout (use -N to disable buffering)
curl -s -N -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer bal_xxxx..." \
  -d '{"model":"deepseek-v3.2","messages":[{"role":"user","content":"hello"}],"max_tokens":100,"stream":true}' \
  | while IFS= read -r line; do
      echo "$line"
    done
javascript
const res = await fetch('https://llm402.ai/v1/chat/completions', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json', 'Authorization': 'Bearer bal_xxxx...' },
  body: JSON.stringify({
    model: 'deepseek-v3.2', messages: [{ role: 'user', content: 'hello' }],
    max_tokens: 100, stream: true
  })
});
const reader = res.body.getReader();
const decoder = new TextDecoder();
let buf = '';
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  buf += decoder.decode(value, { stream: true });
  const lines = buf.split('\n');
  buf = lines.pop() || ''; // keep partial trailing line for the next read
  for (const line of lines) {
    if (line.startsWith('data: ') && line !== 'data: [DONE]') {
      const chunk = JSON.parse(line.slice(6));
      const token = chunk.choices[0]?.delta?.content || '';
      process.stdout.write(token);
    }
  }
}
python
import requests, json

res = requests.post('https://llm402.ai/v1/chat/completions',
    headers={'Content-Type': 'application/json', 'Authorization': 'Bearer bal_xxxx...'},
    json={'model': 'deepseek-v3.2', 'messages': [{'role': 'user', 'content': 'hello'}],
          'max_tokens': 100, 'stream': True},
    stream=True)

for line in res.iter_lines():
    line = line.decode('utf-8')
    if line.startswith('data: ') and line != 'data: [DONE]':
        chunk = json.loads(line[6:])
        token = chunk['choices'][0].get('delta', {}).get('content', '')
        print(token, end='', flush=True)

Payment first: Streaming requires the same payment flow as buffered requests. Pay via L402, x402, or Balance token before sending a stream request. You cannot begin streaming before payment is verified. Cashu does not support streaming -- use buffered mode ("stream": false) with Cashu tokens.

Request Deduplication

Non-streaming responses are cached for 30 seconds. If you retry an identical request (same model, messages, max_tokens, and IP), the server returns the cached response immediately without re-running inference or re-charging you.

ParameterValue
TTL30 seconds
Max entries100
Max entry size1 MB
ScopePer-IP (different IPs get separate caches)

The response includes an X-Dedup header indicating whether the response was served from cache:

http
X-Dedup: hit    # served from cache (no charge)
X-Dedup: miss   # fresh inference

To bypass the cache, send the X-No-Cache: true request header. (Streaming responses and cache-bypassed requests omit the X-Dedup header entirely — the absence of the header should be treated the same as miss.)

Image Generation

Generate images from text prompts. 40+ models across multiple providers, all behind a unified OpenAI-compatible endpoint. All four payment rails are supported (L402, x402, Balance, Cashu).

Endpoint

POST /v1/images/generations or /v1/images/generations/{model}

Request Body

FieldTypeRequiredDescription
modelstringYes*Image model ID (e.g. FLUX.1-schnell). *Not required if model is in URL path.
promptstringYesText description of the image (2-4096 chars)
sizestringNoDimensions as "WxH" string (e.g. "1024x1792"). Use "auto" for model default. Overrides width/height.
widthintegerNoImage width in pixels (64–2048). Must provide both width and height together.
heightintegerNoImage height in pixels (64–2048). Must provide both width and height together.
stepsintegerNoDiffusion steps (1-50, default model-dependent)
response_formatstringNourl (default) or b64_json
seedintegerNoDeterministic seed for reproducibility

Response

json
{
  "created": 1234567890,
  "model": "black-forest-labs/FLUX.1-schnell",
  "data": [
    { "url": "/v1/media/img_abc123..." }
  ]
}

The url field can take three forms:

  • Relative proxy path (e.g. /v1/media/img_abc123...) — the most common form. The image is served from llm402.ai with a 24h TTL and provider-agnostic CSP. Prepend https://llm402.ai to fetch.
  • data: URI (e.g. data:image/png;base64,...) — inline base64 for providers that return raw bytes (the image is embedded; no extra fetch).
  • Direct HTTPS URL — only as a fallback when the media-proxy token can't be created. HTTPS URLs from upstream providers may expire (~7 days).

Each data[i] entry may also include provider-specific extras: revised_prompt (image-prompt rewriting), or timings/index (provider diagnostics). These are pass-through — treat them as best-effort and code defensively.

Key Differences from Chat Completions

  • No streaming — response is synchronous
  • model is required (no auto-routing)
  • One image per request (n is always 1)
  • Pricing is per-image, not per-token
  • All image payments are non-refundable on backend failure — intentional per pentester finding (eliminates refund oracle). This diverges from chat where Balance refunds on upstream failure. Use /v1/estimate-cost + the /v1/models health signal before paying.
  • Request deduplication is disabled
  • Generation time varies: 1–10s for FLUX/diffusion models, 30–90s for GPT-5 Image models
  • Dimensions are automatically rounded to the nearest multiple of 16 for compatibility
  • Both size string and width/height integer formats are accepted

Example

bash
curl -X POST https://llm402.ai/v1/images/generations \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer bal_your_token_here" \
  -d '{
    "model": "black-forest-labs/FLUX.1-schnell",
    "prompt": "A serene mountain landscape at sunset"
  }'

Available Models

40+ image models from multiple providers. Use /v1/models or the Models page for the full live list with current sat prices. Selected highlights:

ModelPriceNotes
flux.1-schnellvariesFast, cheapest FLUX
flux.2-provariesProfessional quality
flux.2-maxvariesMaximum quality
flash-image-2.5variesNano Banana — Gemini image gen
flash-image-3.1variesNano Banana 2 — latest Gemini
imagen-4.0-fastvariesGoogle Imagen
gpt-5-image-minivariesGPT-5 image gen (compact)
gpt-5-imagevariesGPT-5 image gen (full, slow ~60s)
fibovariesJSON-native, enterprise-safe
ideogram/ideogram-3.0126 satsStrong text rendering

Video Generation

Generate videos from text prompts. Unlike image generation, video generation is asynchronous: you create a job, then poll for completion. All four payment rails are supported (L402, x402, Balance, Cashu). Payment is collected when the job is created.

Workflow

  1. Create jobPOST /v1/videos with your prompt and model. Returns 202 Accepted with a job ID and poll URL.
  2. Poll for statusGET /v1/videos/{job_id} (no auth required). Returns queued, processing, completed, or failed.
  3. Download — When status is completed, the response includes a video_url.

Create Job

POST /v1/videos or /v1/videos/generations/{model}

Request Body

FieldTypeRequiredDescription
modelstringYes*Video model ID (e.g. kling-2.1-master). *Not required if model is in URL path.
promptstringYesText description of the video (2-4096 chars)
secondsintegerNoVideo duration in seconds. Valid values depend on the model (check /v1/models for each model’s capabilities.durations). If omitted, the model’s default is used.
widthintegerNoVideo width in pixels. Must be paired with height. Valid values depend on the model (check /v1/models for each model’s capabilities.sizes). If omitted, the model’s default is used.
heightintegerNoVideo height in pixels. Must be paired with width. Valid values depend on the model (check /v1/models for each model’s capabilities.sizes). If omitted, the model’s default is used.
fpsintegerNoFrames per second (1–60). Only some models support this. Check capabilities in the /v1/models response.
stepsintegerNoDiffusion steps (model-dependent)
guidance_scalenumberNoClassifier-free guidance scale
seedintegerNoDeterministic seed for reproducibility
negative_promptstringNoWhat to avoid in the generated video

Pricing

Video pricing varies by provider and model:

  • Together.ai models (Kling, MiniMax, Seedance, etc.) — flat per-video pricing. The price is the same regardless of duration or resolution.
  • OpenRouter models (Veo 3.1) — per-second pricing that scales with duration and resolution. Longer videos and higher resolutions cost more.

The 402 challenge always shows the exact price for the specific parameters you requested. If no optional parameters are specified (duration, resolution), the minimum price for that model is shown.

Model Capabilities

Each video model supports specific durations, sizes, and fps values. Sending unsupported parameters returns 400 Bad Request with the list of supported values for that model. Use GET /v1/models to discover per-model capabilities.

Video models in the /v1/models response include these additional fields:

FieldTypeDescription
model_typestring"video" — identifies this as a video generation model
capabilities.durationsarray | nullSupported duration values in seconds (e.g. [5, 10]), or null if unconstrained
capabilities.sizesarray | nullSupported WxH dimension strings (e.g. ["1920x1080", "1280x720"]), or null if unconstrained
bash
# Discover video model capabilities
curl -s https://llm402.ai/v1/models | jq '.data[] | select(.model_type=="video") | {id, capabilities}'

Response (202 Accepted)

json
{
  "id": "vj_abc123...",
  "status": "queued",
  "model": "minimax/video-01-director",
  "poll_url": "/v1/videos/vj_abc123...",
  "poll_interval_ms": 5000,
  "created_at": 1234567890
}

Poll Job Status

GET /v1/videos/{job_id}

Response (completed)

json
{
  "id": "vj_abc123...",
  "status": "completed",
  "model": "minimax/video-01-director",
  "video_url": "/v1/videos/vj_abc123.../content",
  "done_at": 1234567890,
  "created_at": 1234567000,
  "poll_interval_ms": 5000
}

Response (failed)

json
{
  "id": "vj_abc123...",
  "status": "failed",
  "model": "minimax/video-01-director",
  "error": "upstream provider timeout",
  "poll_interval_ms": 5000
}

Key Differences from Image Generation

  • Asynchronous — returns immediately with a job ID, not a finished result
  • Polling required — use poll_url to check status; respect poll_interval_ms
  • URL-only — no b64_json response format; videos are always returned as URLs
  • Longer generation times — expect 30s–5min depending on model and duration
  • Non-refundable — payment is collected at job creation, not on completion
  • model is required (no auto-routing)
  • One video per request
  • Request deduplication is disabled

Example

bash
# Create video job
curl -X POST https://llm402.ai/v1/videos \
  -H "Authorization: Bearer bal_YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"model": "minimax/video-01-director", "prompt": "A cat walking through a garden", "seconds": 5}'

# Response (202 Accepted):
# {"id":"vj_abc...","status":"queued","model":"minimax/video-01-director","poll_url":"/v1/videos/vj_abc...","poll_interval_ms":5000}

# Poll for completion
curl https://llm402.ai/v1/videos/vj_abc123...

# Response (completed):
# {"id":"vj_abc...","status":"completed","model":"minimax/video-01-director","video_url":"/v1/videos/vj_abc.../content","done_at":1234567890,"created_at":1234567000,"poll_interval_ms":5000}

# Generate a 16:9 HD video with specific duration
curl -X POST https://llm402.ai/v1/videos \
  -H "Authorization: Bearer bal_YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"model": "google/veo-3.0", "prompt": "cat walking on beach", "seconds": 8, "width": 1920, "height": 1080}'

Available Models

Video models from multiple providers. Use /v1/models or the Models page for the full live list with pricing and per-model capabilities. Available models include Sora 2, Veo 3.0, Veo 3.1 (via OpenRouter, per-second pricing), Kling 2.1, Seedance, PixVerse, MiniMax, Vidu, and Wan.

Fetching Generated Video

GET /v1/videos/{job_id}/content

Completed video jobs expose their binary through an opaque proxy endpoint. When the poll response of GET /v1/videos/{job_id} returns status: "completed", the accompanying video_url field is a relative path of the form /v1/videos/vj_…/content pointing at this endpoint. The provider’s actual CDN URL is never exposed to the client.

No additional authentication is required — the job_id itself is a 128-bit capability token (vj_ + 32 hex characters).

End-to-end flow

1. Create the job with a balance token, Lightning, or USDC. Receive a 202 with poll_url.
2. Poll GET /v1/videos/{job_id} at poll_interval_ms until status === "completed".
3. Issue a GET against the video_url path to stream the video bytes.
bash
curl -s -L -o out.mp4 "https://llm402.ai/v1/videos/vj_aaaaaaaabbbbbbbbccccccccdddddddd/content"

The placeholder job ID above will return 404 Video not available — that is expected, and documents the shape of the "unknown or expired job" error.

Response

On success the server streams the binary with the following headers:

HeaderValue
Content-TypeOne of video/mp4, video/webm, video/quicktime, video/x-msvideo. Anything outside this allowlist is coerced to video/mp4.
Content-LengthForwarded from the upstream provider when present. Responses larger than the server’s body cap are rejected with 502.
Cache-Controlprivate, max-age=3600
Content-Security-Policydefault-src 'none'; sandbox (prevents script execution in proxied content)

Status codes

StatusMeaning
200Video bytes streaming.
404Unknown job, not yet completed, or the server no longer has a videoUrl for it (expired).
403Upstream video URL host is not on the proxy allowlist, or the supplied job ID is malformed (must match vj_ + 32 hex chars).
502Upstream download failed or body exceeds size cap.
503Concurrent video-proxy capacity reached; retry with exponential backoff. A Retry-After: 10 header is returned.
504Upstream download timed out (60s).

Rate limit: this endpoint shares the video-polling class at 60 requests / minute per IP.

Provider URLs are never exposed: all completed video content is served via /v1/videos/{job_id}/content. Clients never see the upstream CDN or provider URL, and cannot reach the provider directly.

Handle 503 with backoff: in-flight content requests may be rejected with 503 Video proxy busy when the server reaches its concurrent-proxy cap. Clients MUST implement exponential backoff and retry; do not tight-loop on 503, or you will trip the 60/minute rate limit and receive 429.

Code Examples

Complete examples for each payment method and language.

Uses viem for EIP-712 signing. Install: npm install viem

javascript
const API_URL = 'https://llm402.ai/v1/chat/completions';
const body = JSON.stringify({
  model: 'claude-sonnet-4.6',
  messages: [{ role: 'user', content: 'Say hello.' }],
  max_tokens: 50
});

// 1. Get 402 -- parse Payment-Required header (x402 v2 envelope)
const res402 = await fetch(API_URL, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body });
const envelope = JSON.parse(Buffer.from(res402.headers.get('Payment-Required'), 'base64').toString());
const req = envelope.accepts[0];  // always use accepts[0]
const routedModel = res402.headers.get('X-Route-Model') || 'claude-sonnet-4.6';

// 2. Sign EIP-3009 TransferWithAuthorization (off-chain, no gas)
const signature = await walletClient.signTypedData({
  domain: { name: req.extra.name, version: req.extra.version, chainId: 8453, verifyingContract: req.asset },
  types: { TransferWithAuthorization: [
    { name: 'from', type: 'address' }, { name: 'to', type: 'address' },
    { name: 'value', type: 'uint256' }, { name: 'validAfter', type: 'uint256' },
    { name: 'validBefore', type: 'uint256' }, { name: 'nonce', type: 'bytes32' },
  ]},
  primaryType: 'TransferWithAuthorization',
  message: { from: address, to: req.payTo, value: BigInt(req.amount),
             validAfter: BigInt(now - 600), validBefore: BigInt(now + 120), nonce },
});

// 3. Send with Payment-Signature header (base64-encoded JSON payload)
const res = await fetch(API_URL, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json', 'Payment-Signature': paymentB64 },
  body: JSON.stringify({ model: routedModel, messages, max_tokens: 50 }),
});

Uses eth-account for EIP-712 signing. Install: pip install requests eth-account

python
API_URL = 'https://llm402.ai/v1/chat/completions'
body = {'model': 'claude-sonnet-4.6', 'messages': [{'role': 'user', 'content': 'Say hello.'}], 'max_tokens': 50}

# 1. Get 402 -- parse Payment-Required header (x402 v2 envelope)
res402 = requests.post(API_URL, json=body)
envelope = json.loads(base64.b64decode(res402.headers["Payment-Required"]).decode())
req = envelope["accepts"][0]  # always use accepts[0]
routed_model = res402.headers.get("X-Route-Model", "claude-sonnet-4.6")

# 2. Sign EIP-3009 TransferWithAuthorization (off-chain, no gas)
domain = {"name": req["extra"]["name"], "version": req["extra"]["version"],
          "chainId": 8453, "verifyingContract": req["asset"]}
message = {"from": address, "to": req["payTo"], "value": int(req["amount"]),
           "validAfter": now - 600, "validBefore": now + 120,
           "nonce": bytes.fromhex(nonce[2:])}
signable = encode_typed_data(domain, types, "TransferWithAuthorization", message)
signed = account.sign_message(signable)

# 3. Send with Payment-Signature header (base64-encoded JSON payload)
res = requests.post(API_URL, json={**body, "model": routed_model},
                    headers={"Payment-Signature": payment_b64})

Requires an EIP-712 signing tool (e.g., Foundry's cast) for Step 2.

bash
# 1. Get 402 challenge
curl -s -D- -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-sonnet-4.6","messages":[{"role":"user","content":"hi"}],"max_tokens":50}'
# Response: HTTP 402 with Payment-Required header (base64 JSON) and WWW-Authenticate (L402)

# 2. Decode the Payment-Required header (x402 v2 envelope)
echo "$PAYMENT_REQ_HEADER" | base64 -d | jq .
# Returns: { x402Version: 2, accepts: [{ scheme, network, amount, asset, payTo, extra }], resource, price }
# Use accepts[0] for payment details: jq '.accepts[0]'

# 3. Sign EIP-3009 with cast, build payload, base64 encode (see x402 docs above for full flow)

# 4. Send with payment
curl -s -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Payment-Signature: $PAYMENT_B64" \
  -d '{"model":"claude-sonnet-4.6","messages":[{"role":"user","content":"hi"}],"max_tokens":50}'

Uses ethers.js v6 with MetaMask or Coinbase Wallet. No gas for the payer -- just a signing prompt.

javascript
// 1. Connect wallet + switch to Base
const provider = new ethers.BrowserProvider(window.ethereum);
const signer = await provider.getSigner();
await window.ethereum.request({ method: 'wallet_switchEthereumChain', params: [{ chainId: '0x2105' }] });

// 2. Get 402, parse Payment-Required header (x402 v2 envelope)
const res402 = await fetch(API_URL, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body });
const envelope = JSON.parse(atob(res402.headers.get('Payment-Required')));
const req = envelope.accepts[0];  // always use accepts[0]

// 3. Sign EIP-3009 (wallet popup -- no gas, no approval tx)
const signature = await signer.signTypedData(
  { name: req.extra.name, version: req.extra.version, chainId: 8453, verifyingContract: req.asset },
  { TransferWithAuthorization: [/* from, to, value, validAfter, validBefore, nonce */] },
  { from: address, to: req.payTo, value: BigInt(req.amount), validAfter: BigInt(now-600), validBefore: BigInt(now+120), nonce }
);

// 4. Send with Payment-Signature header
const res = await fetch(API_URL, {
  headers: { 'Content-Type': 'application/json', 'Payment-Signature': btoa(JSON.stringify(payload)) },
  method: 'POST', body: JSON.stringify({ model: routedModel, messages, max_tokens: 50 }),
});

Pay with Bitcoin Lightning. Two-step: get invoice, pay, resend with proof.

bash
# Step 1: Get 402 challenge with Lightning invoice
curl -s -D- -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"deepseek-v3.2","messages":[{"role":"user","content":"hi"}],"max_tokens":50}'

# Response body includes:
#   "invoice": "lnbc210n1pn..."   (pay this with your Lightning wallet)
#   "macaroon": "AgEJbGxt..."     (send this back with the preimage)
#   "price": 21                   (cost in sats)

# Step 2: Pay the Lightning invoice with your wallet.
# Your wallet will give you the preimage (64-char hex).

# Step 3: Resend the request with L402 authorization
curl -s -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: L402 AgEJbGxt...:a1b2c3d4e5f67890..." \
  -d '{"model":"deepseek-v3.2","messages":[{"role":"user","content":"hi"}],"max_tokens":50}'

# Response: HTTP 200 with the chat completion

Prepay for a balance, then use it for multiple requests.

bash
# Step 1: Create a prepaid balance (get Lightning invoice)
curl -s -X POST https://llm402.ai/v1/balance \
  -H "Content-Type: application/json" \
  -d '{"sats": 1000}'
# Response: { "payment_hash": "abc...", "invoice": "lnbc10u...", "sats": 1000, "expires_in": 600 }

# Step 2: Pay the invoice, then poll for the token
curl -s -X POST https://llm402.ai/v1/balance \
  -H "Content-Type: application/json" \
  -d '{"payment_hash": "abc..."}'
# Response: { "paid": true, "token": "bal_xxxx...", "sats": 1000, "expires_at": "..." }

# Step 3: Use the token for requests (no per-request payment needed)
curl -s -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer bal_xxxx..." \
  -d '{"model":"gpt-5.4","messages":[{"role":"user","content":"hello"}],"max_tokens":100}'

# Step 4: Check remaining balance
curl -s -X POST https://llm402.ai/v1/balance \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer bal_xxxx..." \
  -d '{"action": "status"}'
# Response: { "sats": 850, "expires_at": "...", "total_spent": 150, "requests": 3 }

Pricing

Prices are computed dynamically per-request based on the model, your estimated input tokens, and your requested max_tokens. Cheaper models cost as little as 21 sats. The exact price is returned in the 402 response.

A uniform 10% markup is applied over upstream provider cost across every modality — chat, embeddings, images, video, and web search. Same markup regardless of which payment rail you use (L402, x402, Cashu, or prepaid balance). The 21-sat floor applies to all sats-denominated requests.

How max_tokens affects your bill

We charge upfront, once per request, based on input size plus your max_tokens cap — not on actual tokens returned. A 50-word answer at max_tokens: 16384 costs the same as a 5,000-word answer at the same cap. Sizing the cap tightly is the single biggest lever on your invoice.

Default: If you omit max_tokens, the server uses 2048 — enough for ~1,500 words or ~180 lines of code. Fits the vast majority of chat responses.

Guidance for larger outputs:

Use caseRecommended max_tokensRough output size
Short chat / factual answer256 – 5121–2 paragraphs
Standard reply, default2048 (omitted)~1,500 words / ~180 LOC
Long-form explanation, multi-step reasoning4096 – 8192~3,000–6,000 words
Essays, full blog posts, long code files16384~12,000 words
Book chapters, very long generation32768+~24,000+ words

Per-model upper bounds apply — check max_tokens in /v1/models (which returns each model's context_length along with per-token pricing in USD and sats). Requests that exceed a model's context window are rejected before payment with a 400, so you never pay for an impossible request.

Denomination by Protocol

ProtocolUnitMinimumNotes
x402 (USDC) Atomic USDC (6 decimals) ~$0.001 amount: "3150" = $0.003150. Native USD -- no BTC conversion.
L402 (Lightning) Satoshis 21 sats BTC/USD converted at request time. 21-sat floor for all models.
Cashu (ecash) Satoshis 21 sats Same denomination as L402. Send sat-denominated Cashu tokens.
Balance (prepaid) Satoshis 21 sats Funded via Lightning or USDC. Deducted per-request in sats.

Price verification: The server recalculates the price on the paid retry and verifies the signed/paid amount covers the minimum. A rounding tolerance of 5 atomic USDC is allowed for x402.

Errors

All errors on /v1/* endpoints follow the OpenAI error format:

json
{
  "error": {
    "message": "description",
    "type": "error_type",
    "code": "error_code"
  }
}

x402-Specific Errors

CodeHTTPTypeDescription
x402_bad_payload 400 invalid_request_error Payment-Signature header is not valid base64 or not valid JSON
x402_underpayment 402 payment_error Signed amount is less than the model's current price
x402_settlement_failed 402 payment_error Payment rejected (bad sig, insufficient balance, expired auth)
ambiguous_payment 400 invalid_request_error Request has multiple payment headers (Payment-Signature, Authorization, X-Cashu). Use one, not both.

Cashu-Specific Errors

CodeHTTPTypeDescription
cashu_no_stream 400 invalid_request_error Cashu tokens cannot be used with streaming (change requires buffered response)
cashu_too_many_proofs 400 invalid_request_error Token contains more than 20 proofs (DoS prevention limit)
cashu_wrong_unit 400 invalid_request_error Only sat-denominated Cashu tokens are accepted
cashu_mint_not_allowed 400 invalid_request_error Token's mint is not in the server's allowlist
cashu_underpayment 402 payment_error Token value is less than the model's price
cashu_underpayment_after_fees 402 payment_error Token value is less than model's price after mint swap fees

L402-Specific Errors

ReasonHTTPDescription
Invalid macaroon signature 401 Macaroon was tampered with or signed with wrong key
Macaroon expired 401 ExpiresAt caveat exceeded (macaroons valid for 5 min)
Path mismatch 401 Macaroon's RequestPath does not match the endpoint called
max_tokens exceeds paid amount 401 Request max_tokens exceeds the MaxTokens caveat. Get a new invoice.
Input exceeds paid amount 401 Input size grew since invoice was issued (MaxInputChars / MaxInputTokens)
Invoice expired (server restarted) 401 NotBefore caveat fails after container restart. Request a new invoice.
Model mismatch 401 Request model does not match the macaroon's Model caveat
Preimage does not match 401 Preimage does not hash to the macaroon's payment hash

General Errors

Code / ReasonHTTPDescription
Rate limit 429 Per-IP rate limit exceeded. Check Retry-After header for seconds to wait.
Concurrent stream limit 429 Too many concurrent streams from your IP.
Context window exceeded 400 Input + max_tokens exceeds the model's context window
Invalid model 400 Model name not found in the model catalog
Service unavailable 503 Backend provider temporarily unreachable. Try a different model or retry later.

x402 + concurrent streams: The server checks stream capacity before settling USDC on-chain. If you hit the concurrent stream limit (429), your payment has NOT been settled and you can safely retry.

Rate Limits

Rate limits apply per IP address (via cf-connecting-ip). Limits differ by endpoint class.

Endpoint classExample pathsLimit
Free endpoints /health, /v1/models, /v1/estimate-cost, /api/tags, /.well-known/* 60 requests / minute
Invoice requests (402 challenge) POST /v1/balance (create / top-up) 30 requests / minute
Polling endpoints POST /api/invoice/status 60 requests / minute (free tier — supports ~5s polling cadence)
Authenticated inference POST /v1/chat/completions, POST /v1/embeddings, POST /api/chat/*, POST /api/generate/* 60 requests / minute
Media generation POST /v1/images/generations, POST /v1/videos 10 requests / minute
Video job polling GET /v1/videos/{id}, GET /v1/videos/{id}/content 60 requests / minute

Concurrent Stream Limits

ScopeLimit
Per IP5 concurrent streams
Global250 concurrent streams

When rate limited, the response includes a Retry-After header indicating how many seconds to wait before retrying.

Models & Auto-Routing

llm402.ai serves 400+ models across multiple providers. The full model list with pricing is available at the /v1/models endpoint:

bash
curl -s https://llm402.ai/v1/models | jq '.data[].id'

Model Naming

You can use either short names or full provider-prefixed IDs:

Short NameFull ID
deepseek-v3.2deepseek/deepseek-v3.2
claude-sonnet-4.6anthropic/claude-sonnet-4.6
gpt-5.4openai/gpt-5.4

Auto-Routing

Send model: "auto" and an embedding-based classifier picks the best model across 8 task categories: code, reasoning, creative, summarization, multilingual, general_knowledge, agents, chat. Vision is supported via explicit "task": "vision" in the request body (the auto-classifier itself doesn't infer vision — provide images and set the task hint).

  • code -- programming, debugging, code generation
  • reasoning -- logic, math, step-by-step analysis
  • general_knowledge -- factual questions, definitions, Q&A
  • creative -- writing, storytelling, brainstorming
  • summarization -- condensing content, TL;DR
  • chat -- casual conversation, general chat
  • multilingual -- translation, cross-language tasks
  • agents -- function calling, tool integration, structured output
  • vision -- image understanding (multimodal models)

To skip the classifier and route within a specific category, use the task body parameter with model: "auto":

bash
curl -s -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"auto","task":"code","messages":[{"role":"user","content":"Sort a list in Python"}],"max_tokens":200}'

Important (x402 / Cashu): Echo the X-Route-Model from the 402 response into the body on the paid retry. Don't re-send "auto" — the router could pick a different model at a different price. L402 retries can keep "auto" because the macaroon's Model caveat is the binding authority.