llm402.ai API

Pay-per-request LLM inference. No accounts. No API keys. Just pay and prompt.

llm402.ai provides OpenAI-compatible endpoints gated by HTTP 402 micropayments. Send a request, get an invoice, pay it, re-send with proof of payment. Your prompt is processed by one of 400+ models across multiple providers.

Four payment rails are supported. Every 402 response includes all available options -- pick whichever works for your client:

ProtocolCurrencyNetworkHeader
L402 Bitcoin (sats) Lightning Network WWW-Authenticate
x402 USDC (stablecoin) Base L2 (EIP-3009) Payment-Required
Cashu Bitcoin (sats) Ecash tokens X-Cashu
Balance Bitcoin (sats) Prepaid account Authorization: Bearer

Model Naming

Short names work for all models -- no provider prefix needed:

  • deepseek-v3.2, claude-sonnet-4.6, gpt-5.4
  • Full IDs also work: deepseek/deepseek-v3.2, anthropic/claude-sonnet-4.6
  • For auto-routing: use "model": "auto" and read X-Route-Model header from the 402 response

Model in URL Path

All inference endpoints support specifying the model in the URL path instead of the request body:

  • /v1/chat/completions/deepseek-v3.2
  • /v1/images/generations/FLUX.1-schnell
  • /v1/videos/generations/kling-2.1-pro

If both URL path and body contain a model, the body model takes priority. The /v1/models endpoint returns all available model IDs.

Quick Start

x402 (USDC on Base)

Pay with USDC stablecoins. No BTC needed. No gas for the payer.

1. POST /v1/chat/completions --> 402 Server returns Payment-Required header (base64 JSON: price, payTo, EIP-712 domain) 2. Decode header, sign EIP-3009 TransferWithAuthorization with your wallet (off-chain signature -- no gas, no approval tx) 3. POST /v1/chat/completions + Payment-Signature header --> 200 Server settles USDC on-chain, then returns inference

L402 (Bitcoin Lightning)

Pay with Bitcoin over the Lightning Network. Instant settlement, 21-sat minimum.

1. POST /v1/chat/completions --> 402 Server returns WWW-Authenticate header (macaroon + Lightning invoice) 2. Pay the Lightning invoice, receive the preimage 3. POST /v1/chat/completions + Authorization: L402 header --> 200 Authorization: L402 {macaroon}:{preimage}

Endpoints

All inference endpoints are OpenAI-compatible. Base URL: https://llm402.ai

OpenAI-Compatible

MethodPathDescriptionAuth
POST /v1/chat/completions
/v1/chat/completions/{model}
Chat completions (streaming + buffered). Model can be in URL path or request body. L402 / x402 / Balance / Cashu
POST /v1/embeddings Text embeddings (max 128 strings per batch, no streaming) L402 / x402 / Balance / Cashu
POST /v1/images/generations
/v1/images/generations/{model}
Image generation (synchronous, one image per request). Model can be in URL path or request body. L402 / x402 / Balance / Cashu
POST /v1/videos
/v1/videos/generations/{model}
Create a video generation job (async, returns job ID). Model can be in URL path or request body. L402 / x402 / Balance / Cashu
GET /v1/videos/{job_id} Poll video job status (no auth required) None
POST /v1/balance Prepaid balance: create, top up, check status None / Balance
GET /v1/models List all available models (OpenAI-compatible) None (free)

Ollama-Compatible

MethodPathDescriptionAuth
POST /api/generate/{model} Text generation L402 / x402 / Balance / Cashu
POST /api/chat/{model} Chat L402 / x402 / Balance / Cashu
GET /api/tags Model catalog with pricing None (free)

Ollama Examples

bash -- /api/chat/{model}
# Chat via Ollama-compatible endpoint (model in path)
curl -s -X POST https://llm402.ai/api/chat/deepseek-v3.2 \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"hello"}],"stream":false}'
bash -- /api/generate/{model}
# Text generation via Ollama-compatible endpoint
curl -s -X POST https://llm402.ai/api/generate/deepseek-v3.2 \
  -H "Content-Type: application/json" \
  -d '{"prompt":"Explain Bitcoin in one sentence","stream":false}'
bash -- /api/tags
# List all models with pricing and endpoints
curl -s https://llm402.ai/api/tags | jq '.models[] | {name, price_sats}'

Utility

MethodPathDescriptionAuth
GET /health Service health and status None (free)
POST /v1/estimate-cost Pre-authorization cost estimation None (free)
POST /api/invoice/status Poll Lightning invoice payment status None (free)
GET /.well-known/l402 L402 service discovery (agent-readable) None (free)
GET /.well-known/openapi.json OpenAPI 3.1.0 specification None (free)

Estimate Cost

Pre-authorize requests by checking the cost before paying. This endpoint is free and requires no authentication. Useful for MCP clients, agents, and budgeting.

bash
curl -s -X POST https://llm402.ai/v1/estimate-cost \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "Explain quantum computing"}],
    "max_tokens": 500,
    "pref": "balanced"
  }'
Response
{
  "model": "deepseek-v3.2",
  "shortName": "deepseek-v3.2",
  "category": "general_knowledge",
  "confidence": 0.82,
  "rc": 100,
  "estimatedInputTokens": 8,
  "estimatedOutputTokens": 500,
  "costSats": 21,
  "costUsd": 0.000152,
  "btcPrice": 68000
}

Parameters

FieldRequiredDescription
messagesYesArray of message objects (same format as chat completions)
modelNoModel name (short or full ID). If omitted or "auto", the server auto-routes.
max_tokensNoRequested output tokens (defaults to the model's default output token count)
prefNoRouting preference: quality, balanced, cost, speed
max_costNoMaximum cost in sats (routes only to models within budget)

Invoice Status

Poll the payment status of a Lightning invoice. Useful for wallet integrations that need to confirm payment before re-sending with the L402 header.

bash
curl -s -X POST https://llm402.ai/api/invoice/status \
  -H "Content-Type: application/json" \
  -d '{
    "payment_hash": "a1b2c3d4...64hex",
    "macaroon": "AgEJ..."
  }'

# Before payment: { "paid": false }
# After payment:  { "paid": true, "preimage": "e5f6a7b8...64hex" }

Security: The macaroon field is required and must match the payment_hash. This prevents preimage theft by ensuring only the original invoice requester can poll for the preimage.

x402 Protocol (USDC)

x402 uses EIP-3009 TransferWithAuthorization for gasless USDC payments on Base. The payer signs an off-chain authorization; the server settles it on-chain.

Network and Asset

FieldValue
Networkeip155:8453 (Base mainnet, chain ID 8453)
AssetUSDC 0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913
DenominationAtomic USDC (6 decimals: 1000000 = $1.00)

Payment Flow

Step 1: Get the 402 Challenge

Send a normal inference request with no auth headers:

bash
curl -s -D- -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4.6",
    "messages": [{"role": "user", "content": "Say hello."}],
    "max_tokens": 50
  }'

The server responds with HTTP 402. The response body contains all payment information:

402 Response Body
{
  "error": "Payment Required",
  "description": "claude-sonnet-4.6 inference, pay-per-request over Lightning, USDC, or Cashu",
  "price": 42,
  "model": "claude-sonnet-4.6",
  "provider": "llm402.ai",
  "max_tokens": 50,
  "estimated_input_tokens": 12,
  "invoice": "lnbc420n...",
  "macaroon": "AgEJ...",
  "paymentHash": "a1b2c3d4e5f6...64hex",
  "x402": {
    "price_usd": "0.000305",
    "network": "eip155:8453",
    "address": "0xe05cf38...",
    "asset": "0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913",
    "scheme": "exact"
  },
  "cashu": {
    "price_sats": 42,
    "unit": "sat",
    "description": "Send sat-denominated Cashu tokens in X-Cashu header. Any public mint accepted."
  }
}

The response headers include all payment options:

Response Headers
HTTP/2 402
WWW-Authenticate: L402 macaroon="...", invoice="lnbc..."
Payment-Required: eyJzY2hlbWUiOiJleGFjdCIsIm5ldH...
Cache-Control: no-store

Payment-Required Header

Base64-encoded JSON in x402 v2 envelope format. Decode it and use accepts[0] for payment details:

Decoded Payment-Required (x402 v2)
{
  "x402Version": 2,
  "error": "Payment required",
  "accepts": [
    {
      "scheme": "exact",
      "network": "eip155:8453",
      "amount": "3150",
      "asset": "0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913",
      "payTo": "0x...",
      "maxTimeoutSeconds": 120,
      "extra": {
        "name": "USD Coin",
        "version": "2"
      }
    }
  ],
  "resource": {
    "url": "/v1/chat/completions",
    "description": "LLM inference",
    "mimeType": "application/json"
  },
  "price": "$0.003150"
}
FieldDescription
x402VersionAlways 2
acceptsArray of payment options. Always use accepts[0]
accepts[0].schemeAlways "exact"
accepts[0].networkAlways "eip155:8453" (Base mainnet)
accepts[0].amountPrice in atomic USDC (6 decimals). "3150" = $0.003150
accepts[0].assetUSDC contract address on Base
accepts[0].payToServer's wallet address (recipient)
accepts[0].maxTimeoutSecondsMaximum settlement time (120s)
accepts[0].extra.nameEIP-712 domain name. Always "USD Coin" (not "USDC" -- that is testnet)
accepts[0].extra.versionEIP-712 domain version. Always "2"
priceHuman-readable USD price (informational only, use accepts[0].amount for signing)

Step 2: Sign the EIP-3009 Authorization

Build a TransferWithAuthorization signature using EIP-712 typed data.

EIP-712 Domain

javascript
const domain = {
  name: "USD Coin",        // from extra.name
  version: "2",            // from extra.version
  chainId: 8453,           // Base mainnet
  verifyingContract: "0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913"
};

EIP-712 Types

javascript
const types = {
  TransferWithAuthorization: [
    { name: "from",        type: "address" },
    { name: "to",          type: "address" },
    { name: "value",       type: "uint256" },
    { name: "validAfter",  type: "uint256" },
    { name: "validBefore", type: "uint256" },
    { name: "nonce",       type: "bytes32" },
  ]
};

Authorization Message

javascript
const now = Math.floor(Date.now() / 1000);
const nonce = "0x" + crypto.randomBytes(32).toString("hex");
const opt = paymentRequired.accepts[0];  // always use accepts[0]

const message = {
  from:        walletAddress,             // your address (payer)
  to:          opt.payTo,                 // from accepts[0]
  value:       BigInt(opt.amount),
  validAfter:  BigInt(now - 600),         // 10 min ago (clock skew buffer)
  validBefore: BigInt(now + 120),         // 2 min from now
  nonce:       nonce,
};

Step 3: Build the Payment Payload

Construct the V2 payment payload and base64-encode it:

javascript
const opt = paymentRequired.accepts[0];  // always use accepts[0]
const payload = {
  x402Version: 2,
  resource: paymentRequired.resource,
  accepted: {
    scheme: opt.scheme,
    network: opt.network,
    amount: opt.amount,
    asset: opt.asset,
    payTo: opt.payTo,
    maxTimeoutSeconds: opt.maxTimeoutSeconds,
    extra: opt.extra
  },
  payload: {
    signature: signature,
    authorization: {
      from: walletAddress,
      to: opt.payTo,
      value: opt.amount,
      validAfter: (now - 600).toString(),
      validBefore: (now + 120).toString(),
      nonce: nonce
    }
  }
};

const paymentSignature = Buffer.from(JSON.stringify(payload)).toString("base64");

Step 4: Send the Paid Request

Re-send the same inference request with the Payment-Signature header:

bash
curl -s -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Payment-Signature: eyJ4NDAyVmVyc2lvbiI6Mix..." \
  -d '{
    "model": "claude-sonnet-4.6",
    "messages": [{"role": "user", "content": "Say hello."}],
    "max_tokens": 50
  }'

Response (HTTP 200):

json
{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "model": "claude-sonnet-4.6",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Hello! How can I help you today?"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 4,
    "completion_tokens": 10,
    "total_tokens": 14
  }
}

Auto-routing gotcha: If you use model: "auto", the server routes to a specific model and returns it in the X-Route-Model header. On the retry with payment, you must use the specific model from the 402 response -- not "auto" again. Otherwise the router may pick a different model with a different price.

CORS

The server allows cross-origin x402 requests:

HTTP Headers
Access-Control-Allow-Headers: Content-Type, Authorization, Payment-Signature, X-Cashu
Access-Control-Expose-Headers: X-Route-Model, X-Route-Category, Payment-Required, WWW-Authenticate, X-Cashu-Change

Nonce Replay Protection

Each signed authorization can only be used once. Replay protection is enforced both server-side and on-chain via EIP-3009 nonces.

L402 Protocol (Lightning)

L402 (formerly LSAT) combines HTTP 402 status codes with Lightning Network payments and macaroon-based authentication. It is the original payment protocol supported by llm402.ai.

Payment Flow

1. Client sends POST /v1/chat/completions (no auth) | v 2. Server returns 402 with: WWW-Authenticate: L402 macaroon="...", invoice="lnbc..." | v 3. Client pays the Lightning invoice, obtains the preimage | v 4. Client re-sends request with: Authorization: L402 {macaroon}:{preimage} | v 5. Server verifies macaroon + preimage --> proxies to inference --> 200

Curl Example

bash -- Step 1: Get 402 with invoice
# Send a request with no auth -- get back a 402 with an invoice
curl -s -D- -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-v3.2",
    "messages": [{"role": "user", "content": "hi"}],
    "max_tokens": 50
  }'

# Response includes:
#   WWW-Authenticate: L402 macaroon="AgEJ...", invoice="lnbc210n1pn..."
#   Body: { "error": "Payment Required", "price": 21, "invoice": "lnbc...", "macaroon": "AgEJ...", ... }
bash -- Step 2: Pay invoice, then resend with L402 auth
# Pay the Lightning invoice with your wallet and get the preimage.
# Then resend the exact same request with the L402 Authorization header:

curl -s -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: L402 AgEJbGxtNDAyLmFp...:a1b2c3d4e5f67890..." \
  -d '{
    "model": "deepseek-v3.2",
    "messages": [{"role": "user", "content": "hi"}],
    "max_tokens": 50
  }'

# Response: HTTP 200 with chat completion

WWW-Authenticate Header

The 402 response includes a WWW-Authenticate header with two components:

HTTP Header
WWW-Authenticate: L402 macaroon="AgELbGxt...", invoice="lnbc50n1pn..."
ComponentDescription
macaroonBase64-encoded V2 TLV macaroon with embedded caveats. Bound to a specific payment hash.
invoiceBOLT-11 Lightning invoice. Pay this to obtain the preimage.

Macaroon Caveats

Each macaroon is bound with first-party caveats that restrict its use. The server verifies all caveats on the paid request and rejects any that fail (fail-closed):

CaveatFormatDescription
RequestPathRequestPath = /v1/chat/completionsRestricts the macaroon to a specific API endpoint
ExpiresAtExpiresAt = 1712345678Unix timestamp expiry (5 minutes from issuance)
MaxTokensMaxTokens = 256Maximum output tokens the request may use
MaxInputCharsMaxInputChars = 1500Prevents input inflation after invoice issuance
MaxInputTokensMaxInputTokens = 400Prevents token-count gaming (chars pass but tokens are higher)
NotBeforeNotBefore = 1712340000Prevents preimage replay after server restart
MaxInputItemsMaxInputItems = 5Limits batch size for /v1/embeddings requests (no tolerance)
ModelModel = claude-sonnet-4.6Binds the macaroon to a specific model (prevents cross-model bypass)

Fail-closed design: Unrecognized caveats are rejected. This ensures future caveat additions don't accidentally pass on old server versions.

Authorization Header

After paying the Lightning invoice and receiving the preimage, send the authorization:

HTTP Header
Authorization: L402 AgELbGxt...:abc123def456...

Format: L402 {base64_macaroon}:{hex_preimage}

The server verifies:

  • The macaroon signature against the root key
  • All caveats pass (path, expiry, tokens, model, etc.)
  • The preimage hashes to the payment hash embedded in the macaroon identifier
  • The preimage has not been used before (burn-on-success: preimage is only burned after successful inference)

Preimage preservation: If inference fails after payment verification, the preimage is NOT burned. You can retry with the same macaroon:preimage pair until the macaroon expires (5 minutes).

Balance Tokens (Prepaid)

Balance tokens let you prepay for multiple requests with a single Lightning payment or USDC transfer. Fund a balance once, then use Authorization: Bearer bal_... on any gated endpoint without per-request payment flows.

How It Works

1. POST /v1/balance + { "sats": 1000 } --> 402 (Lightning invoice) 2. Pay the Lightning invoice with your wallet 3. POST /v1/balance + { "payment_hash": "hex64" } --> 200 Returns: { "paid": true, "token": "bal_...", "sats": 1000 } 4. Use Authorization: Bearer bal_... on any endpoint

Endpoints

Create Balance (Lightning)

Request a Lightning invoice to fund a new balance:

bash
# Step 1: Request an invoice for 1000 sats
curl -s -X POST https://llm402.ai/v1/balance \
  -H "Content-Type: application/json" \
  -d '{"sats": 1000}'

# Response (402):
# { "payment_hash": "a1b2c3...", "invoice": "lnbc10u...", "sats": 1000, "expires_in": 600 }

Poll for Payment

After paying the invoice, poll with the payment hash to get your token:

bash
# Step 2: Poll until paid
curl -s -X POST https://llm402.ai/v1/balance \
  -H "Content-Type: application/json" \
  -d '{"payment_hash": "a1b2c3d4e5f6...64hex"}'

# Before payment: { "paid": false }
# After payment:  { "paid": true, "token": "bal_xxxx...", "sats": 1000, "expires_at": "2026-05-03T..." }

Check Balance

bash
curl -s -X POST https://llm402.ai/v1/balance \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer bal_xxxx..." \
  -d '{"action": "status"}'

# Response:
# { "sats": 850, "expires_at": "2026-05-03T...", "total_spent": 150, "requests": 7 }

Top Up (Lightning)

Add sats to an existing balance:

bash
# Get a top-up invoice
curl -s -X POST https://llm402.ai/v1/balance \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer bal_xxxx..." \
  -d '{"sats": 500}'

# Returns 402 with a new invoice. Pay it, then poll with payment_hash as above.

Fund with USDC (x402)

Fund a balance using USDC instead of Lightning. Include a Payment-Signature header with the signed x402 payload:

bash
# Fund with USDC (sign an x402 payment for the equivalent USD amount)
curl -s -X POST https://llm402.ai/v1/balance \
  -H "Content-Type: application/json" \
  -H "Payment-Signature: eyJ4NDAyVmVyc2lvbiI6Mix..." \
  -d '{"sats": 500}'

# Response: { "paid": true, "token": "bal_xxxx...", "sats": 500 }

Use Balance Token

Include the token as a Bearer auth on any gated endpoint:

bash
curl -s -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer bal_xxxx..." \
  -d '{
    "model": "deepseek-v3.2",
    "messages": [{"role": "user", "content": "Explain Lightning Network"}],
    "max_tokens": 200
  }'

Token Lifecycle

RuleValue
Inactivity TTL30 days (resets on each use)
Max lifetime90 days from creation
Max balance50,000 sats
Min deposit100 sats

Top-ups reset the inactivity timer but do not extend the 90-day max lifetime. Plan deposits accordingly.

Cashu Tokens (Ecash)

Pay with Cashu ecash tokens -- instant, private Bitcoin micropayments with no Lightning channel required. Send tokens directly in the request header. If you overpay, the server returns change tokens.

How It Works

1. POST /v1/chat/completions (no auth) --> 402 Response body includes cashu.price_sats 2. POST /v1/chat/completions + X-Cashu header --> 200 Server swaps tokens at the mint, runs inference, returns change if overpaid

Request

Send a cashuB (v4) token in the X-Cashu header:

bash
curl -s -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-Cashu: cashuBo2F0gaJha..." \
  -d '{
    "model": "deepseek-v3.2",
    "messages": [{"role": "user", "content": "hello"}],
    "max_tokens": 100
  }'

Change Tokens

If the token value exceeds the model's price by 2 or more sats, the server returns change in the X-Cashu-Change response header:

Response Headers
HTTP/2 200
X-Cashu-Consumed: true
X-Cashu-Change: cashuBo2F0gaJha...  # remaining sats as ecash token

The change token is a cashuB token you can use for future requests.

Constraints

RuleValue
Token formatcashuB (v4) only. Deprecated cashuA (v3) tokens are rejected.
UnitSat-denominated only (no USD or other units)
Max proofs20 per token (DoS prevention)
StreamingNot supported. Cashu requires buffered responses to calculate change. Use "stream": false.
Change threshold2 sats minimum. Overpayment of 1 sat is absorbed (not worth the mint round-trip).
Change size limit8 KB. If the change token exceeds 8 KB, it is absorbed by the server.
MintServer-configured allowlist. HTTPS-only, no private IPs. The 402 response body's cashu.description field indicates the current policy.

No 402 dance needed: Unlike L402 and x402, you can skip the initial 402 request if you already know the price. Just send the Cashu token directly -- the server verifies the token value covers the model's price.

MCP Server

llm402.ai provides a hosted Model Context Protocol (MCP) server. Connect from any MCP client — Claude Code, Claude Desktop, Cursor, or any tool that supports MCP. Six tools are available: text inference, image generation, video generation, model discovery, balance management, and funding.

Setup

1. Get a balance token Visit llm402.ai/chat and fund a balance with Lightning or USDC. Copy your bal_ token from the balance display. 2. Add to your MCP client config

Claude Code (~/.claude.json):

{
  "mcpServers": {
    "llm402": {
      "url": "https://llm402.ai/mcp",
      "headers": {
        "Authorization": "Bearer bal_YOUR_TOKEN_HERE"
      }
    }
  }
}

Claude Desktop (claude_desktop_config.json):

{
  "mcpServers": {
    "llm402": {
      "url": "https://llm402.ai/mcp",
      "headers": {
        "Authorization": "Bearer bal_YOUR_TOKEN_HERE"
      }
    }
  }
}

Replace bal_YOUR_TOKEN_HERE with your actual balance token. That's it — the MCP client discovers all tools automatically.

Available Tools

ToolAuthDescription
llm402_inference Required Text inference. 335+ models, auto-routed by default. Supports system prompts, model selection, temperature, max_tokens, and routing preference (quality/balanced/cost/speed).
llm402_image Required Image generation. Requires a specific model ID (e.g. black-forest-labs/FLUX.1-schnell). Supports width, height, steps, seed, negative prompt.
llm402_video Required Video generation (async). Requires a specific model ID (e.g. Wan-AI/wan2.7-t2v). Supports seconds, width, height, fps. Polls for completion up to 90s, then returns job URL for manual polling.
llm402_models None List available models. Optional substring filter (e.g. "deepseek", "flux"). Free, no balance required.
llm402_balance Required Check your prepaid balance: remaining sats, total deposited, total spent, request count.
llm402_fund Required Generate a Lightning invoice to top up your balance. Default 5,000 sats. Polls for payment confirmation up to 45 seconds.

Example: Text Inference

# Using curl against the MCP endpoint directly (JSON-RPC format)
curl -X POST https://llm402.ai/mcp \
  -H "Content-Type: application/json" \
  -H "Accept: application/json, text/event-stream" \
  -H "Authorization: Bearer bal_YOUR_TOKEN" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
      "name": "llm402_inference",
      "arguments": {
        "prompt": "Explain quantum computing in one sentence.",
        "max_tokens": 100
      }
    },
    "id": 1
  }'

Example: Image Generation

curl -X POST https://llm402.ai/mcp \
  -H "Content-Type: application/json" \
  -H "Accept: application/json, text/event-stream" \
  -H "Authorization: Bearer bal_YOUR_TOKEN" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
      "name": "llm402_image",
      "arguments": {
        "prompt": "A cyberpunk cityscape at night with neon lights",
        "model": "black-forest-labs/FLUX.1-schnell"
      }
    },
    "id": 1
  }'

OpenAI-Compatible Alternative

If your tool doesn't support MCP but accepts OPENAI_BASE_URL, the same balance token works directly:

export OPENAI_BASE_URL=https://llm402.ai/v1
export OPENAI_API_KEY=bal_YOUR_TOKEN

This works with Cursor, Aider, LangChain, the OpenAI Python SDK, and any tool that accepts a custom base URL.

Endpoint

MCP endpoint: https://llm402.ai/mcp

Protocol: Streamable HTTP (POST only, stateless). Responses are Server-Sent Events containing JSON-RPC results.

Streaming

Add "stream": true to your request body to receive Server-Sent Events (SSE) as tokens are generated. The format follows the OpenAI streaming specification.

Request

bash
curl -s -N -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer bal_xxxx..." \
  -d '{
    "model": "deepseek-v3.2",
    "messages": [{"role": "user", "content": "Count to 5"}],
    "max_tokens": 100,
    "stream": true
  }'

Response Format

The server sends a series of data: lines. Each line is a JSON chunk with a delta object containing the next token(s):

SSE stream
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1712345678,"model":"deepseek-v3.2","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1712345678,"model":"deepseek-v3.2","choices":[{"index":0,"delta":{"content":"1"},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1712345678,"model":"deepseek-v3.2","choices":[{"index":0,"delta":{"content":", "},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1712345678,"model":"deepseek-v3.2","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]
FieldDescription
choices[0].delta.roleSent on the first chunk only ("assistant")
choices[0].delta.contentThe next token(s) of the response
choices[0].finish_reasonnull while generating, "stop" on the final chunk
data: [DONE]End-of-stream marker. Close the connection after this line.

Heartbeat: During long-running inferences, the server sends : heartbeat SSE comments every 15 seconds to keep the connection alive. These are not data lines and should be ignored by your parser.

Consuming the Stream

bash
# Stream tokens to stdout (use -N to disable buffering)
curl -s -N -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer bal_xxxx..." \
  -d '{"model":"deepseek-v3.2","messages":[{"role":"user","content":"hello"}],"max_tokens":100,"stream":true}' \
  | while IFS= read -r line; do
      echo "$line"
    done
javascript
const res = await fetch('https://llm402.ai/v1/chat/completions', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json', 'Authorization': 'Bearer bal_xxxx...' },
  body: JSON.stringify({
    model: 'deepseek-v3.2', messages: [{ role: 'user', content: 'hello' }],
    max_tokens: 100, stream: true
  })
});
const reader = res.body.getReader();
const decoder = new TextDecoder();
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  const text = decoder.decode(value, { stream: true });
  for (const line of text.split('\n')) {
    if (line.startsWith('data: ') && line !== 'data: [DONE]') {
      const chunk = JSON.parse(line.slice(6));
      const token = chunk.choices[0]?.delta?.content || '';
      process.stdout.write(token);
    }
  }
}
python
import requests, json

res = requests.post('https://llm402.ai/v1/chat/completions',
    headers={'Content-Type': 'application/json', 'Authorization': 'Bearer bal_xxxx...'},
    json={'model': 'deepseek-v3.2', 'messages': [{'role': 'user', 'content': 'hello'}],
          'max_tokens': 100, 'stream': True},
    stream=True)

for line in res.iter_lines():
    line = line.decode('utf-8')
    if line.startswith('data: ') and line != 'data: [DONE]':
        chunk = json.loads(line[6:])
        token = chunk['choices'][0].get('delta', {}).get('content', '')
        print(token, end='', flush=True)

Payment first: Streaming requires the same payment flow as buffered requests. Pay via L402, x402, or Balance token before sending a stream request. You cannot begin streaming before payment is verified. Cashu does not support streaming -- use buffered mode ("stream": false) with Cashu tokens.

Request Deduplication

Non-streaming responses are cached for 30 seconds. If you retry an identical request (same model, messages, max_tokens, and IP), the server returns the cached response immediately without re-running inference or re-charging you.

ParameterValue
TTL30 seconds
Max entries100
Max entry size1 MB
ScopePer-IP (different IPs get separate caches)

The response includes an X-Dedup header indicating whether the response was served from cache:

Response Headers
X-Dedup: hit    # served from cache (no charge)
X-Dedup: miss   # fresh inference

To bypass the cache, send the X-No-Cache: true request header.

Image Generation

Generate images from text prompts. 40+ models across three providers, all behind a unified OpenAI-compatible endpoint. All five payment rails are supported (L402, x402, Balance, Cashu, 402 challenge).

Endpoint

POST /v1/images/generations or /v1/images/generations/{model}

Request Body

FieldTypeRequiredDescription
modelstringYes*Image model ID (e.g. FLUX.1-schnell). *Not required if model is in URL path.
promptstringYesText description of the image (2-4096 chars)
sizestringNoDimensions as "WxH" string (e.g. "1024x1792"). Use "auto" for model default. Overrides width/height.
widthintegerNoImage width in pixels (64–2048). Must provide both width and height together.
heightintegerNoImage height in pixels (64–2048). Must provide both width and height together.
stepsintegerNoDiffusion steps (1-50, default model-dependent)
response_formatstringNourl (default) or b64_json
seedintegerNoDeterministic seed for reproducibility

Response

200 OK
{
  "created": 1234567890,
  "model": "black-forest-labs/FLUX.1-schnell",
  "data": [
    { "url": "https://..." }
  ]
}

The url field contains either an HTTPS URL or a data: URI depending on the model. Both are valid image sources for <img> tags. HTTPS URLs expire after ~7 days.

Key Differences from Chat Completions

  • No streaming — response is synchronous
  • model is required (no auto-routing)
  • One image per request (n is always 1)
  • Pricing is per-image, not per-token
  • All payments are non-refundable
  • Request deduplication is disabled
  • Generation time varies: 1–10s for FLUX/diffusion models, 30–90s for GPT-5 Image models
  • Dimensions are automatically rounded to the nearest multiple of 16 for compatibility
  • Both size string and width/height integer formats are accepted

Example

bash
curl -X POST https://llm402.ai/v1/images/generations \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer bal_your_token_here" \
  -d '{
    "model": "black-forest-labs/FLUX.1-schnell",
    "prompt": "A serene mountain landscape at sunset"
  }'

Available Models

40+ image models from three providers. Use /v1/models or the Models page for the full live list. Selected highlights:

ModelPriceNotes
black-forest-labs/FLUX.1-schnell21 satsFast, cheapest FLUX
black-forest-labs/FLUX.2-pro63 satsProfessional quality
black-forest-labs/FLUX.2-max147 satsMaximum quality
google/flash-image-2.582 satsNano Banana — Gemini image gen
google/flash-image-3.1105 satsNano Banana 2 — latest Gemini
google/imagen-4.0-fast42 satsGoogle Imagen
openai/gpt-5-image-mini21 satsGPT-5 image gen (compact)
openai/gpt-5-image105 satsGPT-5 image gen (full, slow ~60s)
Bria/fibo84 satsJSON-native, enterprise-safe
ideogram/ideogram-3.0126 satsStrong text rendering

Video Generation

Generate videos from text prompts. Unlike image generation, video generation is asynchronous: you create a job, then poll for completion. All four payment rails are supported (L402, x402, Balance, Cashu). Payment is collected when the job is created.

Workflow

  1. Create jobPOST /v1/videos with your prompt and model. Returns 202 Accepted with a job ID and poll URL.
  2. Poll for statusGET /v1/videos/{job_id} (no auth required). Returns queued, processing, completed, or failed.
  3. Download — When status is completed, the response includes a video_url.

Create Job

POST /v1/videos or /v1/videos/generations/{model}

Request Body

FieldTypeRequiredDescription
modelstringYes*Video model ID (e.g. kling-2.1-pro). *Not required if model is in URL path.
promptstringYesText description of the video (2-4096 chars)
secondsintegerNoVideo duration in seconds (default varies by model). Supported values are model-specific. Use /v1/models to check.
widthintegerNoVideo width in pixels. Supported sizes are model-specific. Invalid sizes rejected with 400.
heightintegerNoVideo height in pixels. Supported sizes are model-specific. Invalid sizes rejected with 400.
fpsintegerNoFrames per second. Model-specific. Not all models support custom fps.
stepsintegerNoDiffusion steps (model-dependent)
guidance_scalenumberNoClassifier-free guidance scale
seedintegerNoDeterministic seed for reproducibility
negative_promptstringNoWhat to avoid in the generated video

Pricing

Video pricing varies by provider and model:

  • Together.ai models (Kling, MiniMax, Seedance, etc.) — flat per-video pricing. The price is the same regardless of duration or resolution.
  • OpenRouter models (Veo 3.1) — per-second pricing that scales with duration and resolution. Longer videos and higher resolutions cost more.

The 402 challenge always shows the exact price for the specific parameters you requested. If no optional parameters are specified (duration, resolution), the minimum price for that model is shown.

Model Capabilities

Each video model supports specific durations, sizes, and fps values. Sending unsupported parameters returns 400 Bad Request with the list of supported values for that model. Use GET /v1/models to discover per-model capabilities including supported durations, resolutions, and fps options.

Response (202 Accepted)

202 Accepted
{
  "id": "vj_abc123...",
  "status": "queued",
  "model": "minimax/video-01-director",
  "poll_url": "/v1/videos/vj_abc123...",
  "poll_interval_ms": 5000,
  "created_at": 1234567890
}

Poll Job Status

GET /v1/videos/{job_id}

Response (completed)

200 OK — completed
{
  "id": "vj_abc123...",
  "status": "completed",
  "model": "minimax/video-01-director",
  "video_url": "https://...",
  "done_at": 1234567890
}

Response (failed)

200 OK — failed
{
  "id": "vj_abc123...",
  "status": "failed",
  "model": "minimax/video-01-director",
  "error": "upstream provider timeout",
  "poll_interval_ms": 5000
}

Key Differences from Image Generation

  • Asynchronous — returns immediately with a job ID, not a finished result
  • Polling required — use poll_url to check status; respect poll_interval_ms
  • URL-only — no b64_json response format; videos are always returned as URLs
  • Longer generation times — expect 30s–5min depending on model and duration
  • Non-refundable — payment is collected at job creation, not on completion
  • model is required (no auto-routing)
  • One video per request
  • Request deduplication is disabled

Example

bash
# Create video job
curl -X POST https://llm402.ai/v1/videos \
  -H "Authorization: Bearer bal_YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"model": "minimax/video-01-director", "prompt": "A cat walking through a garden", "seconds": 5}'

# Response (202 Accepted):
# {"id":"vj_abc...","status":"queued","model":"minimax/video-01-director","poll_url":"/v1/videos/vj_abc...","poll_interval_ms":5000}

# Poll for completion
curl https://llm402.ai/v1/videos/vj_abc123...

# Response (completed):
# {"id":"vj_abc...","status":"completed","model":"minimax/video-01-director","video_url":"https://...","done_at":1234567890}

Available Models

Video models from multiple providers. Use /v1/models or the Models page for the full live list with pricing and per-model capabilities. Available models include Sora 2, Veo 3.0, Veo 3.1 (via OpenRouter, per-second pricing), Kling 2.1, Seedance, PixVerse, MiniMax, Vidu, and Wan.

Code Examples

Complete examples for each payment method and language.

Uses viem for EIP-712 signing. Install: npm install viem

Key steps
const API_URL = 'https://llm402.ai/v1/chat/completions';
const body = JSON.stringify({
  model: 'claude-sonnet-4.6',
  messages: [{ role: 'user', content: 'Say hello.' }],
  max_tokens: 50
});

// 1. Get 402 -- parse Payment-Required header (x402 v2 envelope)
const res402 = await fetch(API_URL, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body });
const envelope = JSON.parse(Buffer.from(res402.headers.get('Payment-Required'), 'base64').toString());
const req = envelope.accepts[0];  // always use accepts[0]
const routedModel = res402.headers.get('X-Route-Model') || 'claude-sonnet-4.6';

// 2. Sign EIP-3009 TransferWithAuthorization (off-chain, no gas)
const signature = await walletClient.signTypedData({
  domain: { name: req.extra.name, version: req.extra.version, chainId: 8453, verifyingContract: req.asset },
  types: { TransferWithAuthorization: [
    { name: 'from', type: 'address' }, { name: 'to', type: 'address' },
    { name: 'value', type: 'uint256' }, { name: 'validAfter', type: 'uint256' },
    { name: 'validBefore', type: 'uint256' }, { name: 'nonce', type: 'bytes32' },
  ]},
  primaryType: 'TransferWithAuthorization',
  message: { from: address, to: req.payTo, value: BigInt(req.amount),
             validAfter: BigInt(now - 600), validBefore: BigInt(now + 120), nonce },
});

// 3. Send with Payment-Signature header (base64-encoded JSON payload)
const res = await fetch(API_URL, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json', 'Payment-Signature': paymentB64 },
  body: JSON.stringify({ model: routedModel, messages, max_tokens: 50 }),
});

Uses eth-account for EIP-712 signing. Install: pip install requests eth-account

Key steps
API_URL = 'https://llm402.ai/v1/chat/completions'
body = {'model': 'claude-sonnet-4.6', 'messages': [{'role': 'user', 'content': 'Say hello.'}], 'max_tokens': 50}

# 1. Get 402 -- parse Payment-Required header (x402 v2 envelope)
res402 = requests.post(API_URL, json=body)
envelope = json.loads(base64.b64decode(res402.headers["Payment-Required"]).decode())
req = envelope["accepts"][0]  # always use accepts[0]
routed_model = res402.headers.get("X-Route-Model", "claude-sonnet-4.6")

# 2. Sign EIP-3009 TransferWithAuthorization (off-chain, no gas)
domain = {"name": req["extra"]["name"], "version": req["extra"]["version"],
          "chainId": 8453, "verifyingContract": req["asset"]}
message = {"from": address, "to": req["payTo"], "value": int(req["amount"]),
           "validAfter": now - 600, "validBefore": now + 120,
           "nonce": bytes.fromhex(nonce[2:])}
signable = encode_typed_data(domain, types, "TransferWithAuthorization", message)
signed = account.sign_message(signable)

# 3. Send with Payment-Signature header (base64-encoded JSON payload)
res = requests.post(API_URL, json={**body, "model": routed_model},
                    headers={"Payment-Signature": payment_b64})

Requires an EIP-712 signing tool (e.g., Foundry's cast) for Step 2.

bash
# 1. Get 402 challenge
curl -s -D- -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-sonnet-4.6","messages":[{"role":"user","content":"hi"}],"max_tokens":50}'
# Response: HTTP 402 with Payment-Required header (base64 JSON) and WWW-Authenticate (L402)

# 2. Decode the Payment-Required header (x402 v2 envelope)
echo "$PAYMENT_REQ_HEADER" | base64 -d | jq .
# Returns: { x402Version: 2, accepts: [{ scheme, network, amount, asset, payTo, extra }], resource, price }
# Use accepts[0] for payment details: jq '.accepts[0]'

# 3. Sign EIP-3009 with cast, build payload, base64 encode (see x402 docs above for full flow)

# 4. Send with payment
curl -s -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Payment-Signature: $PAYMENT_B64" \
  -d '{"model":"claude-sonnet-4.6","messages":[{"role":"user","content":"hi"}],"max_tokens":50}'

Uses ethers.js v6 with MetaMask or Coinbase Wallet. No gas for the payer -- just a signing prompt.

Key steps (ethers.js v6)
// 1. Connect wallet + switch to Base
const provider = new ethers.BrowserProvider(window.ethereum);
const signer = await provider.getSigner();
await window.ethereum.request({ method: 'wallet_switchEthereumChain', params: [{ chainId: '0x2105' }] });

// 2. Get 402, parse Payment-Required header (x402 v2 envelope)
const res402 = await fetch(API_URL, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body });
const envelope = JSON.parse(atob(res402.headers.get('Payment-Required')));
const req = envelope.accepts[0];  // always use accepts[0]

// 3. Sign EIP-3009 (wallet popup -- no gas, no approval tx)
const signature = await signer.signTypedData(
  { name: req.extra.name, version: req.extra.version, chainId: 8453, verifyingContract: req.asset },
  { TransferWithAuthorization: [/* from, to, value, validAfter, validBefore, nonce */] },
  { from: address, to: req.payTo, value: BigInt(req.amount), validAfter: BigInt(now-600), validBefore: BigInt(now+120), nonce }
);

// 4. Send with Payment-Signature header
const res = await fetch(API_URL, {
  headers: { 'Content-Type': 'application/json', 'Payment-Signature': btoa(JSON.stringify(payload)) },
  method: 'POST', body: JSON.stringify({ model: routedModel, messages, max_tokens: 50 }),
});

Pay with Bitcoin Lightning. Two-step: get invoice, pay, resend with proof.

bash
# Step 1: Get 402 challenge with Lightning invoice
curl -s -D- -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"deepseek-v3.2","messages":[{"role":"user","content":"hi"}],"max_tokens":50}'

# Response body includes:
#   "invoice": "lnbc210n1pn..."   (pay this with your Lightning wallet)
#   "macaroon": "AgEJbGxt..."     (send this back with the preimage)
#   "price": 21                   (cost in sats)

# Step 2: Pay the Lightning invoice with your wallet.
# Your wallet will give you the preimage (64-char hex).

# Step 3: Resend the request with L402 authorization
curl -s -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: L402 AgEJbGxt...:a1b2c3d4e5f67890..." \
  -d '{"model":"deepseek-v3.2","messages":[{"role":"user","content":"hi"}],"max_tokens":50}'

# Response: HTTP 200 with the chat completion

Prepay for a balance, then use it for multiple requests.

bash
# Step 1: Create a prepaid balance (get Lightning invoice)
curl -s -X POST https://llm402.ai/v1/balance \
  -H "Content-Type: application/json" \
  -d '{"sats": 1000}'
# Response: { "payment_hash": "abc...", "invoice": "lnbc10u...", "sats": 1000, "expires_in": 600 }

# Step 2: Pay the invoice, then poll for the token
curl -s -X POST https://llm402.ai/v1/balance \
  -H "Content-Type: application/json" \
  -d '{"payment_hash": "abc..."}'
# Response: { "paid": true, "token": "bal_xxxx...", "sats": 1000, "expires_at": "..." }

# Step 3: Use the token for requests (no per-request payment needed)
curl -s -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer bal_xxxx..." \
  -d '{"model":"gpt-5.4","messages":[{"role":"user","content":"hello"}],"max_tokens":100}'

# Step 4: Check remaining balance
curl -s -X POST https://llm402.ai/v1/balance \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer bal_xxxx..." \
  -d '{"action": "status"}'
# Response: { "sats": 850, "expires_at": "...", "total_spent": 150, "requests": 3 }

Pricing

Prices are computed dynamically per-request based on the model, your estimated input tokens, and your requested max_tokens. Cheaper models cost as little as 21 sats. The exact price is returned in the 402 response.

Denomination by Protocol

ProtocolUnitMinimumNotes
x402 (USDC) Atomic USDC (6 decimals) ~$0.001 amount: "3150" = $0.003150. Native USD -- no BTC conversion.
L402 (Lightning) Satoshis 21 sats BTC/USD converted at request time. 21-sat floor for all models.
Cashu (ecash) Satoshis 21 sats Same denomination as L402. Send sat-denominated Cashu tokens.
Balance (prepaid) Satoshis 21 sats Funded via Lightning or USDC. Deducted per-request in sats.

Price verification: The server recalculates the price on the paid retry and verifies the signed/paid amount covers the minimum. A rounding tolerance of 5 atomic USDC is allowed for x402.

Errors

All errors on /v1/* endpoints follow the OpenAI error format:

Error Response Format
{
  "error": {
    "message": "description",
    "type": "error_type",
    "code": "error_code"
  }
}

x402-Specific Errors

CodeHTTPTypeDescription
x402_bad_payload 400 invalid_request_error Payment-Signature header is not valid base64 or not valid JSON
x402_underpayment 402 payment_error Signed amount is less than the model's current price
x402_settlement_failed 402 payment_error Payment rejected (bad sig, insufficient balance, expired auth)
ambiguous_payment 400 invalid_request_error Request has multiple payment headers (Payment-Signature, Authorization, X-Cashu). Use one, not both.

Cashu-Specific Errors

CodeHTTPTypeDescription
cashu_no_stream 400 invalid_request_error Cashu tokens cannot be used with streaming (change requires buffered response)
cashu_too_many_proofs 400 invalid_request_error Token contains more than 20 proofs (DoS prevention limit)
cashu_wrong_unit 400 invalid_request_error Only sat-denominated Cashu tokens are accepted
cashu_mint_not_allowed 400 invalid_request_error Token's mint is not in the server's allowlist
cashu_underpayment 402 payment_error Token value is less than the model's price
cashu_underpayment_after_fees 402 payment_error Token value is less than model's price after mint swap fees

L402-Specific Errors

ReasonHTTPDescription
Invalid macaroon signature 401 Macaroon was tampered with or signed with wrong key
Macaroon expired 401 ExpiresAt caveat exceeded (macaroons valid for 5 min)
Path mismatch 401 Macaroon's RequestPath does not match the endpoint called
max_tokens exceeds paid amount 401 Request max_tokens exceeds the MaxTokens caveat. Get a new invoice.
Input exceeds paid amount 401 Input size grew since invoice was issued (MaxInputChars / MaxInputTokens)
Invoice expired (server restarted) 401 NotBefore caveat fails after container restart. Request a new invoice.
Model mismatch 401 Request model does not match the macaroon's Model caveat
Preimage does not match 401 Preimage does not hash to the macaroon's payment hash

General Errors

Code / ReasonHTTPDescription
Rate limit 429 Per-IP rate limit exceeded. Check Retry-After header for seconds to wait.
Concurrent stream limit 429 Too many concurrent streams from your IP.
Context window exceeded 400 Input + max_tokens exceeds the model's context window
Invalid model 400 Model name not found in the model catalog
Service unavailable 503 Backend provider temporarily unreachable. Try a different model or retry later.

x402 + concurrent streams: The server checks stream capacity before settling USDC on-chain. If you hit the concurrent stream limit (429), your payment has NOT been settled and you can safely retry.

Rate Limits

Rate limits are applied per IP address across three tiers:

TierApplied When
free Unauthenticated requests (landing page, /v1/models, /health)
invoice 402 challenge requests (getting an invoice)
paid Requests with valid Payment-Signature, L402 auth, balance token, or X-Cashu token

Limits per Tier

TierLimitWindow
free60 requests1 minute
invoice30 requests1 minute
paid60 requests1 minute

Concurrent Stream Limits

ScopeLimit
Per IP5 concurrent streams
Global250 concurrent streams

When rate limited, the response includes a Retry-After header indicating how many seconds to wait before retrying.

Models & Auto-Routing

llm402.ai serves 400+ models across multiple providers. The full model list with pricing is available at the /v1/models endpoint:

bash
curl -s https://llm402.ai/v1/models | jq '.data[].id'

Model Naming

You can use either short names or full provider-prefixed IDs:

Short NameFull ID
deepseek-v3.2deepseek/deepseek-v3.2
claude-sonnet-4.6anthropic/claude-sonnet-4.6
gpt-5.4openai/gpt-5.4

Auto-Routing

Send model: "auto" and the server picks the best model for your prompt, routing across 9 task categories:

  • code -- programming, debugging, code generation
  • reasoning -- logic, math, step-by-step analysis
  • general_knowledge -- factual questions, definitions, Q&A
  • creative -- writing, storytelling, brainstorming
  • summarization -- condensing content, TL;DR
  • chat -- casual conversation, general chat
  • multilingual -- translation, cross-language tasks
  • agents -- function calling, tool integration, structured output
  • vision -- image understanding (multimodal models)

To skip the classifier and route within a specific category, use the task body parameter with model: "auto":

bash
curl -s -X POST https://llm402.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"auto","task":"code","messages":[{"role":"user","content":"Sort a list in Python"}],"max_tokens":200}'

Important: When using auto-routing, always use the X-Route-Model header from the 402 response for the paid retry. Do not send "auto" again -- the router may pick a different model with a different price.