helix
v4.2 — Streaming responses now in GA

Inference,
without the wait.

Deploy any open-source model in seconds. Route between providers intelligently. Observe every token. Built by infrastructure people for infrastructure people.

helix.dev — inference
● live
$ helix deploy llama-3.1-70b
→ provisioning gpu pool (h100 × 4)
→ warming kv cache
→ binding /v1/chat/completions
ready in 1.42s
curl https://api.helix.dev/v1/chat \
-d {"model": "llama-3.1-70b"}
cold start 82ms
p50 latency
92ms-12%
tokens/sec
1,847+8%
cost / 1M
$0.18-23%

Trusted by engineering teams shipping at scale

VERCELLINEARRAYCASTSUPABASERESENDPLANETSCALEARCFRAMERVERCELLINEARRAYCASTSUPABASERESENDPLANETSCALEARCFRAMER
002 — Platform

Everything you need.
Nothing you don't.

Five primitives that compose into the inference stack your team actually wants to use.

Cold starts

From request to first token in under 100ms.

Pre-warmed GPU pools across 14 regions. Snapshot restoration in under a second. Your users never see a spinner.

cold start ms↓ 74% vs baseline
Routing

Smart failover.

Define fallback chains across providers. We handle the rest.

openai/gpt-4o
↳ anthropic/claude
↳ fireworks/llama-70b ✓
Observability

Every token, traced.

Security

SOC 2 Type II. HIPAA-ready. PII redaction at the edge.

Your prompts never train another model. Encryption in transit and at rest. Audit logs for every request.

Workflows

Chain models like unix pipes.

Compose multi-step pipelines: classify → route → generate → validate. All with one client.

003 — Developer experience

Three lines.
Production ready.

No SDK lock-in. No special infrastructure. Drop into any codebase that speaks HTTP.

helix-client@4.2.0
1import { Helix } from 'helix';
2
3const client = new Helix({
4 apiKey: process.env.HELIX_KEY,
5 fallback: ['gpt-4o', 'claude-3.5', 'llama-3.1-70b'],
6});
7
8const stream = await client.chat.stream({
9 messages: [
10 { role: 'user', content: 'Write a haiku about latency.' }
11 ],
12 observability: { trace: true },
13});
14
15for await (const token of stream) {
16 process.stdout.write(token.delta);
17}
0B+

Tokens processed monthly

0ms

Median cold start

0.00%

Platform uptime

0

Global edge regions

004 — Pricing

Pay for what you use.
Nothing more.

No seat fees. No platform tax. The infrastructure is the product.

Hobby

01

For weekend projects and prototypes.

$0/ month
  • 1M tokens / month
  • Community Discord
  • All open-source models
  • Basic observability
Start free
Most popular

Pro

02

For teams shipping to real users.

$49/ month
  • 50M tokens included
  • $0.18 / 1M after
  • Smart routing & fallbacks
  • Full traces & metrics
  • Email support, 24h SLA
  • SOC 2 reports
Start 14-day trial

Enterprise

03

Dedicated infra, custom terms.

Let'stalk
  • Volume pricing
  • Private GPU pools
  • VPC peering
  • Custom SLAs
  • HIPAA / FedRAMP
  • Dedicated Slack channel
Talk to sales
005 — Questions

Frequently
unasked.

Median cold start across our edge is 82ms. p99 sits under 240ms. We publish live numbers at status.helix.dev — no marketing math.

006 — Ship faster

Stop tuning infra.
Start tuning models.

5,000 free tokens. No credit card. Deploy your first endpoint in the time it takes to refill your coffee.