v4.2 — Streaming responses now in GA

Inference,
without the wait.

Deploy any open-source model in seconds. Route between providers intelligently. Observe every token. Built by infrastructure people for infrastructure people.

Start free — no card→$npm i helix

helix.dev — inference

● live

$ helix deploy llama-3.1-70b

→ provisioning gpu pool (h100 × 4)

→ warming kv cache

→ binding /v1/chat/completions

✓ ready in 1.42s

curl https://api.helix.dev/v1/chat \

-d {"model": "llama-3.1-70b"}

cold start 82ms

p50 latency

92ms-12%

tokens/sec

1,847+8%

cost / 1M

$0.18-23%

Trusted by engineering teams shipping at scale

VERCELLINEARRAYCASTSUPABASERESENDPLANETSCALEARCFRAMERVERCELLINEARRAYCASTSUPABASERESENDPLANETSCALEARCFRAMER

002 — Platform

Everything you need.
Nothing you don't.

Five primitives that compose into the inference stack your team actually wants to use.

Cold starts

From request to first token in under 100ms.

Pre-warmed GPU pools across 14 regions. Snapshot restoration in under a second. Your users never see a spinner.

cold start ms↓ 74% vs baseline

Routing

Smart failover.

Define fallback chains across providers. We handle the rest.

openai/gpt-4o

↳ anthropic/claude

↳ fireworks/llama-70b ✓

Observability

Every token, traced.

Security

SOC 2 Type II. HIPAA-ready. PII redaction at the edge.

Your prompts never train another model. Encryption in transit and at rest. Audit logs for every request.

Workflows

Chain models like unix pipes.

Compose multi-step pipelines: classify → route → generate → validate. All with one client.

003 — Developer experience

Three lines.
Production ready.

No SDK lock-in. No special infrastructure. Drop into any codebase that speaks HTTP.

helix-client@4.2.0

1import { Helix } from 'helix';
2
3const client = new Helix({
4  apiKey: process.env.HELIX_KEY,
5  fallback: ['gpt-4o', 'claude-3.5', 'llama-3.1-70b'],
6});
7
8const stream = await client.chat.stream({
9  messages: [
10    { role: 'user', content: 'Write a haiku about latency.' }
11  ],
12  observability: { trace: true },
13});
14
15for await (const token of stream) {
16  process.stdout.write(token.delta);
17}

0B+

Tokens processed monthly

0ms

Median cold start

0.00%

Platform uptime

Global edge regions

004 — Pricing

Pay for what you use.
Nothing more.

No seat fees. No platform tax. The infrastructure is the product.

Hobby

For weekend projects and prototypes.

$0/ month

1M tokens / month
Community Discord
All open-source models
Basic observability

Start free →

Pro

For teams shipping to real users.

$49/ month

50M tokens included
$0.18 / 1M after
Smart routing & fallbacks
Full traces & metrics
Email support, 24h SLA
SOC 2 reports

Start 14-day trial →

Enterprise

Dedicated infra, custom terms.

Let'stalk

Volume pricing
Private GPU pools
VPC peering
Custom SLAs
HIPAA / FedRAMP
Dedicated Slack channel

Talk to sales →

005 — Questions

Frequently
unasked.

Median cold start across our edge is 82ms. p99 sits under 240ms. We publish live numbers at status.helix.dev — no marketing math.

006 — Ship faster

Stop tuning infra.
Start tuning models.

5,000 free tokens. No credit card. Deploy your first endpoint in the time it takes to refill your coffee.

Get started → free Read the docs

Inference,without the wait.

Everything you need.Nothing you don't.