Private beta·Built on Cloudflare Workers

Your AI agents
never go down.

Relay is the reliable delivery layer for LLM APIs. One line of code adds automatic retry, provider failover, smart caching and a live dashboard — so your product stays up even when Anthropic and OpenAI don't.

  • Drop-in Anthropic SDK replacement
  • Edge proxy < 30ms overhead
  • BYOK — your keys, encrypted
agent.tsLive
01import { Relay } from '@relay/sdk';
02
03const llm = new Relay({ apiKey: process.env.RELAY_KEY });
04
05// Auto retry, failover, cache — handled.
06const reply = await llm.messages.create({
07 model: 'claude-sonnet-4-6',
08 messages: [{ role: 'user', content: 'ping' }],
09 relay: { fallback: 'gpt-4o', cache: 'semantic' },
10});
Anthropic down
Failover → OpenAI
Cache hit · 38ms
99.99%
Effective uptime
< 30ms
Edge overhead
47%
Failed calls saved
5+
Providers covered
AnthropicOpenAIMistralGeminiCohereDeepSeekGroqTogetherAnthropicOpenAIMistralGeminiCohereDeepSeekGroqTogether
The reality

Every AI builder ships the same reliability problems from scratch.

Four bugs you have hit this quarter — and will keep hitting until something else handles them for you.

01 · Outage

Provider goes down — your product goes with it.

Anthropic API drops 30 minutes. Your agent stops mid-task. Your user closes the tab. Your churn goes up.

02 · Rate limit

529 overloaded, right when traffic spikes.

Peak hours are exactly when you can't afford 429s. Every retry you don't write yourself is a request that quietly dies.

03 · Boilerplate

You wrote exponential backoff again. And again.

Every project ships its own retry loop. Half of them get the jitter wrong. None of them failover to a backup provider.

04 · Cost

You paid twice for the same answer.

Same system prompt, same user input — different request id. Without caching, every duplicate burns tokens you'll never see again.

The fix

Delete the retry file. Keep shipping.

Replace @anthropic-ai/sdk with @relay/sdk. Same API, different superpowers.

Before
without-relay.ts
01// 47 lines of retry boilerplate, written badly.
02async function callLLM(body) {
03 let attempt = 0;
04 while (attempt < 3) {
05 try {
06 return await anthropic.messages.create(body);
07 } catch (e) {
08 attempt++;
09 await sleep(2 ** attempt * 1000);
10 }
11 }
12 throw new Error('Gave up');
13}
14// Still no failover. No cache. No metrics.
After
with-relay.ts
01// One line. Reliability handled.
02const reply = await llm.messages.create(body);
03
04// What Relay just did for you:
05// ✓ exponential backoff with jitter
06// ✓ failover Anthropic → OpenAI on 5xx
07// ✓ exact + semantic cache lookup
08// ✓ streaming preserved end-to-end
09// ✓ live metrics in your dashboard
What you get

Six features. One install.

Automatic provider failover

When Anthropic returns 5xx, Relay routes the next attempt to OpenAI — with model mapping you can override per request.

AnthropicRelayOpenAI

Exponential backoff, done right

Decorrelated jitter, configurable ceilings, surfaced in the dashboard so you see what was retried and why.

Smart caching

Exact-match on day one, semantic with embeddings on Hobby and up. Stop paying twice for the same answer.

Live dashboard for every request

Latency, cost, cache hit rate, retries, failover events — searchable, filterable, and streaming as it happens.

p95 · 412ms

BYOK, encrypted at rest

You keep paying Anthropic directly. We never see plaintext keys — AES-256-GCM, audited access.

Edge-native, globally

Runs on Cloudflare Workers across 300+ cities. Under 30ms of proxy overhead from anywhere in the world.

How it works

From npm i to first reliable call in four steps.

01

Drop-in the SDK

Swap @anthropic-ai/sdk for @relay/sdk. Same method signatures, same streaming. Your existing code keeps working.

npm i @relay/sdk
02

Plug in your providers

Paste your Anthropic and OpenAI keys into the dashboard. We encrypt them with AES-256-GCM and never log plaintext.

Anthropic ✓ OpenAI ✓ Mistral ✓
03

Ship through the edge

Every call hits a Cloudflare Worker in 300+ cities. Cache lookup, retry policy and failover happen before bytes leave the region.

POST /v1/messages → ~28ms
04

Watch it stay up

Open the dashboard. Filter requests by status, model, latency or cost. The next outage shows up as a green failover chip.

uptime 99.99% · failovers handled 137
Pricing

Free until you ship. Then small.

MonthlyAnnual-20%
Free

Hack on a side project.

$0/ month
  • 1,000 requests / month
  • Exact-match cache
  • Anthropic + OpenAI failover
  • Community support
Start free
Hobby

Indie devs shipping to real users.

$19/ month
  • 50,000 requests / month
  • Exact + semantic cache
  • Streaming & SSE proxy
  • Email support · 48h
Choose Hobby
Popular
Pro

Small teams in production.

$49/ month
  • 250,000 requests / month
  • Custom retry policies
  • Multi-key routing
  • Email support · 24h
Choose Pro
Scale

When downtime costs real money.

$199/ month
  • 1,000,000 requests / month
  • Custom cache rules
  • Bring-your-own providers
  • Priority support · 4h
Talk to us
FAQ

The honest answers, upfront.

Be the first to ship on a reliable LLM stack.

Join the private beta. Early signups get six months of Pro on the house when we open the gates.

487builders waiting$0setup cost<5 minto first call