· 3 min read

Keymesh Latency Benchmarks: How Much Overhead Does an AI Gateway Add?

We measured the actual latency overhead of routing AI API calls through Keymesh. Spoiler: it's under 50ms—negligible compared to inference time.

By Keymesh Team

One of the first questions engineers ask about AI gateways is: “How much latency does it add?”

It’s a fair question. When you’re building real-time AI features, every millisecond matters. Adding a proxy layer between your code and OpenAI feels risky.

So we ran the benchmarks. Here’s what we found.

TL;DR

MetricDirect to OpenAIThrough KeymeshOverhead
Streaming TTFB (p50)378ms427ms+49ms
Non-streaming (p50)783ms792ms+9ms

The verdict: Keymesh adds roughly 10-50ms of overhead—negligible when AI inference itself takes 300-2000ms.

The Test Setup

We tested against OpenAI’s gpt-4o-mini model with a simple prompt (“Count from 1 to 10 slowly, one number per line.”). Each test ran 10 iterations to account for variance.

  • Direct API: Requests sent straight to api.openai.com
  • Keymesh Proxy: Same requests routed through proxy.keymesh.dev

We measured two key metrics:

  • TTFB (Time to First Byte): How long until the first response chunk arrives
  • Total Time: Complete request-to-response duration

Streaming Results

Streaming is where latency matters most. Users see tokens appear in real-time, so any delay in that first chunk is noticeable.

                    Direct API      Keymesh       Overhead
────────────────────────────────────────────────────────────
TTFB (p50)             378ms         427ms       +49ms
TTFB (p95)             930ms         912ms       ~same
────────────────────────────────────────────────────────────
Total (p50)            754ms         785ms       +31ms
Total (p95)           1480ms        1271ms       ~same

Note: The p95 numbers are similar—any apparent “advantage” for Keymesh is likely noise from our 10-iteration sample. The key takeaway is that even in worst-case scenarios, there’s no meaningful additional latency.

Non-Streaming Results

For batch processing or single-response use cases:

                    Direct API      Keymesh       Overhead
────────────────────────────────────────────────────────────
TTFB (p50)             781ms         792ms       +11ms
Total (p50)            783ms         792ms       +9ms

Non-streaming overhead is minimal—under 10ms in most cases.

Why Is the Overhead So Low?

Three reasons:

  1. Edge deployment: Keymesh runs on Cloudflare Workers, meaning requests hit a datacenter near you before routing to OpenAI. This often reduces network latency.

  2. No request buffering: We stream responses through a TransformStream—chunks pass through instantly without waiting for the full response.

  3. Lightweight processing: We parse SSE events on-the-fly to extract usage data, but we never block the stream.

What About the AI Inference Time?

Here’s the real perspective: AI inference dominates latency.

In our tests, OpenAI’s gpt-4o-mini took 400-1500ms to generate a response. That’s 10-30x the Keymesh overhead.

┌─────────────────────────────────────────────────────────┐
│                Total Request Time: ~800ms               │
├──────────────────────────────────────────┬──────────────┤
│      OpenAI Inference (~750ms)           │ Keymesh      │
│      [████████████████████████████████]  │ (~50ms)      │
├──────────────────────────────────────────┴──────────────┤
│  93% AI processing  │  7% proxy overhead                │
└─────────────────────────────────────────────────────────┘

For users, the difference between 750ms and 800ms is imperceptible. But the difference between “shared API keys with no visibility” and “per-developer keys with budget controls” is huge.

Methodology Notes

  • Model: gpt-4o-mini
  • Iterations: 10 per test
  • Metrics: TTFB, total time, p50, p95
  • Location: Tests run from Europe to OpenAI US endpoints
  • Date: January 21, 2026

Want to run these benchmarks yourself? The script is straightforward—just time requests to OpenAI directly vs through Keymesh and compare TTFB. Reach out if you’d like the benchmark code.

Conclusion

Adding Keymesh to your AI stack costs you ~50ms at most. In exchange, you get:

  • Per-developer API keys with instant revocation
  • Hard budget limits that actually block requests
  • Real-time usage tracking and cost attribution
  • No code changes beyond swapping the base URL

That’s a trade-off most teams will happily make.


Ready to try it? Get started for free →