We Run 5 AI Providers in Production (And Finally Know What They Cost)
TL;DR: We integrated DeepSeek, Fal.ai (FLUX.2), Google Imagen 4, OpenAI, and Anthropic into a single routing layer with Helicone observability. Users can pick their preferred provider. Fallbacks happen automatically. We know exactly what every AI call costs per business, per step, per provider. Our AI spend dropped 40% without sacrificing quality.
The Problem: One Provider Isn’t Enough
When you’re generating entire websites with AI—research, content, images, domain names, chatbot training—you make a lot of API calls. At our scale, a single provider is a liability:
- OpenAI goes down → Every website generation fails
- Costs spike → No alternative routing available
- Quality varies → GPT-4o is great at content but expensive for simple tasks
- Image generation → No single model is best at everything
We needed a system that could route between providers intelligently, fall back gracefully, and—critically—tell us what everything costs.
The Architecture
Provider Abstraction Layer
Every AI call in our system goes through a unified client:
export const getAIClient = (options?: {
stepId?: string;
businessId?: string;
providerOverride?: AIProvider;
}) => {
const provider = options?.providerOverride || getAIProvider();
switch (provider) {
case 'anthropic': return { client: getAnthropic(options), provider };
case 'deepseek': return { client: getDeepSeek(options), provider };
default: return { client: getOpenAI(options), provider };
}
};
The key insight: providerOverride lets users choose their preferred AI. The system default is configurable per-environment. And every call carries stepId and businessId metadata for tracking.
Automatic Fallbacks
When a provider fails, we don’t just retry—we route to an alternate:
// Fallback chain: deepseek → openai → anthropic → openai
const fallbackMap = {
deepseek: 'openai',
openai: 'anthropic',
anthropic: 'openai',
};
This saved us during two DeepSeek outages in January. Zero user-facing failures.
Image Provider Routing
For image generation, we run three providers with different strengths:
| Provider | Best For | Cost |
|---|---|---|
| Fal.ai FLUX.2 | General stock photos, text rendering | ~$0.02/image |
| Gemini Imagen 4 | Photorealistic scenes, lifestyle shots | ~$0.04/image |
| OpenAI DALL-E 3 | Creative/artistic imagery | ~$0.04/image |
export const getImageProvider = (): ImageProvider => {
const provider = process.env.IMAGE_PROVIDER || heliconeConfig.defaultImageProvider;
if (provider === 'fal' && isFalConfigured()) return 'fal';
if (provider === 'gemini' && isGeminiConfigured()) return 'gemini';
return 'openai'; // Always available as fallback
};
Vision Capabilities Across Providers
Not every provider supports vision (analyzing images). DeepSeek’s standard API doesn’t. So we built smart routing:
const needsVisionFallback = currentProvider === 'deepseek';
const useFalForVision = needsVisionFallback && isFalAvailable();
// Route to Fal.ai Moondream for vision if DeepSeek is primary
const effectiveVisionProvider = useFalForVision
? 'fal'
: (needsVisionFallback ? 'openai' : currentProvider);
This means users on DeepSeek (cheapest provider) still get full vision capabilities through Fal.ai’s Moondream model.
Helicone: The Missing Piece
Running 5 AI providers without observability is flying blind. Helicone gives us a single dashboard for all providers.
How We Integrated It
Every provider routes through Helicone’s gateway:
// Each provider has a Helicone gateway URL
const HELICONE_GATEWAYS = {
openai: 'https://oai.helicone.ai/v1',
anthropic: 'https://anthropic.helicone.ai',
deepseek: 'https://deepseek.helicone.ai',
gemini: 'https://gateway.helicone.ai',
fal: 'https://gateway.helicone.ai',
};
Metadata We Track
Every request carries structured metadata:
const headers = {
'Helicone-Auth': `Bearer ${apiKey}`,
'Helicone-Property-step': stepId, // Which generation step
'Helicone-Property-businessId': businessId, // Which business
'Helicone-Property-environment': env, // dev/staging/prod
};
This means we can answer questions like:
- “How much does the research step cost per business?”
- “Which provider is cheapest for content generation?”
- “What’s our total AI spend this week, broken down by step?”
Cost Reality Check
Here’s what we learned from Helicone data:
| Model | Input ($/1M tokens) | Output ($/1M tokens) |
|---|---|---|
| DeepSeek Chat | $0.28 | $0.42 |
| GPT-4o-mini | $0.15 | $0.60 |
| GPT-4o | $2.50 | $10.00 |
| Claude 3.5 Sonnet | $3.00 | $15.00 |
| DeepSeek Reasoner | $0.55 | $2.19 |
DeepSeek is 10x cheaper than GPT-4o for comparable quality on most of our tasks. We route simple content generation to DeepSeek and reserve GPT-4o/Claude for complex reasoning steps (strategy, competitive analysis).
Capacity-Based Model Selection
Not every step needs the smartest model. We built a capacity system:
const STEP_MODEL_CONFIG = {
research: { capacity: 'default' }, // DeepSeek Chat is fine
strategy: { capacity: 'high' }, // Needs advanced reasoning
logo: { capacity: 'default' },
heroImage: { capacity: 'default' },
layout: { capacity: 'default' },
assembly: { capacity: 'default' },
};
// High-capacity models (for complex reasoning)
const HIGH_CAPACITY = {
openai: 'gpt-5',
anthropic: 'claude-opus-4-0',
deepseek: 'deepseek-reasoner',
};
Results
After 3 weeks of running this system:
- 40% cost reduction vs. GPT-4o-only baseline
- Zero downtime from provider outages (fallbacks caught 12 incidents)
- Per-business cost visibility → we know exactly what each website generation costs
- User choice → power users can pick their preferred AI provider
What We’d Do Differently
- Start with observability. We added Helicone after building multi-provider routing. Should have been day one.
- Test vision capabilities early. We discovered DeepSeek’s vision limitations in production.
- Cache aggressively. Same prompts hit different providers during fallbacks. Caching identical requests saves money.
Try It
Every website generated on WebZum uses this multi-model system. The AI picks the best provider for each step, falls back automatically on failures, and we track every token. Your business website gets enterprise-grade AI infrastructure without enterprise-grade pricing.