Provider goes down — your product goes with it.
Anthropic API drops 30 minutes. Your agent stops mid-task. Your user closes the tab. Your churn goes up.
Relay is the reliable delivery layer for LLM APIs. One line of code adds automatic retry, provider failover, smart caching and a live dashboard — so your product stays up even when Anthropic and OpenAI don't.
01import { Relay } from '@relay/sdk';0203const llm = new Relay({ apiKey: process.env.RELAY_KEY });0405// Auto retry, failover, cache — handled.06const reply = await llm.messages.create({07 model: 'claude-sonnet-4-6',08 messages: [{ role: 'user', content: 'ping' }],09 relay: { fallback: 'gpt-4o', cache: 'semantic' },10});Four bugs you have hit this quarter — and will keep hitting until something else handles them for you.
Anthropic API drops 30 minutes. Your agent stops mid-task. Your user closes the tab. Your churn goes up.
Peak hours are exactly when you can't afford 429s. Every retry you don't write yourself is a request that quietly dies.
Every project ships its own retry loop. Half of them get the jitter wrong. None of them failover to a backup provider.
Same system prompt, same user input — different request id. Without caching, every duplicate burns tokens you'll never see again.
Replace @anthropic-ai/sdk with @relay/sdk. Same API, different superpowers.
01// 47 lines of retry boilerplate, written badly.02async function callLLM(body) {03 let attempt = 0;04 while (attempt < 3) {05 try {06 return await anthropic.messages.create(body);07 } catch (e) {08 attempt++;09 await sleep(2 ** attempt * 1000);10 }11 }12 throw new Error('Gave up');13}14// Still no failover. No cache. No metrics.01// One line. Reliability handled.02const reply = await llm.messages.create(body);0304// What Relay just did for you:05// ✓ exponential backoff with jitter06// ✓ failover Anthropic → OpenAI on 5xx07// ✓ exact + semantic cache lookup08// ✓ streaming preserved end-to-end09// ✓ live metrics in your dashboardWhen Anthropic returns 5xx, Relay routes the next attempt to OpenAI — with model mapping you can override per request.
Decorrelated jitter, configurable ceilings, surfaced in the dashboard so you see what was retried and why.
Exact-match on day one, semantic with embeddings on Hobby and up. Stop paying twice for the same answer.
Latency, cost, cache hit rate, retries, failover events — searchable, filterable, and streaming as it happens.
You keep paying Anthropic directly. We never see plaintext keys — AES-256-GCM, audited access.
Runs on Cloudflare Workers across 300+ cities. Under 30ms of proxy overhead from anywhere in the world.
Swap @anthropic-ai/sdk for @relay/sdk. Same method signatures, same streaming. Your existing code keeps working.
Paste your Anthropic and OpenAI keys into the dashboard. We encrypt them with AES-256-GCM and never log plaintext.
Every call hits a Cloudflare Worker in 300+ cities. Cache lookup, retry policy and failover happen before bytes leave the region.
Open the dashboard. Filter requests by status, model, latency or cost. The next outage shows up as a green failover chip.
Hack on a side project.
Indie devs shipping to real users.
Small teams in production.
When downtime costs real money.
Join the private beta. Early signups get six months of Pro on the house when we open the gates.