Inference,
without the wait.
Deploy any open-source model in seconds. Route between providers intelligently. Observe every token. Built by infrastructure people for infrastructure people.
Trusted by engineering teams shipping at scale
Everything you need.
Nothing you don't.
Five primitives that compose into the inference stack your team actually wants to use.
From request to first token in under 100ms.
Pre-warmed GPU pools across 14 regions. Snapshot restoration in under a second. Your users never see a spinner.
Smart failover.
Define fallback chains across providers. We handle the rest.
Every token, traced.
SOC 2 Type II. HIPAA-ready. PII redaction at the edge.
Your prompts never train another model. Encryption in transit and at rest. Audit logs for every request.
Chain models like unix pipes.
Compose multi-step pipelines: classify → route → generate → validate. All with one client.
Three lines.
Production ready.
No SDK lock-in. No special infrastructure. Drop into any codebase that speaks HTTP.
1import { Helix } from 'helix';23const client = new Helix({4 apiKey: process.env.HELIX_KEY,5 fallback: ['gpt-4o', 'claude-3.5', 'llama-3.1-70b'],6});78const stream = await client.chat.stream({9 messages: [10 { role: 'user', content: 'Write a haiku about latency.' }11 ],12 observability: { trace: true },13});1415for await (const token of stream) {16 process.stdout.write(token.delta);17}Tokens processed monthly
Median cold start
Platform uptime
Global edge regions
Pay for what you use.
Nothing more.
No seat fees. No platform tax. The infrastructure is the product.
Hobby
For weekend projects and prototypes.
- 1M tokens / month
- Community Discord
- All open-source models
- Basic observability
Pro
For teams shipping to real users.
- 50M tokens included
- $0.18 / 1M after
- Smart routing & fallbacks
- Full traces & metrics
- Email support, 24h SLA
- SOC 2 reports
Enterprise
Dedicated infra, custom terms.
- Volume pricing
- Private GPU pools
- VPC peering
- Custom SLAs
- HIPAA / FedRAMP
- Dedicated Slack channel
Frequently
unasked.
Median cold start across our edge is 82ms. p99 sits under 240ms. We publish live numbers at status.helix.dev — no marketing math.
Stop tuning infra.
Start tuning models.
5,000 free tokens. No credit card. Deploy your first endpoint in the time it takes to refill your coffee.