We Run 5 AI Providers in Production (And Finally Know What They Cost)

TL;DR: We integrated DeepSeek, Fal.ai (FLUX.2), Google Imagen 4, OpenAI, and Anthropic into a single routing layer with Helicone observability. Users can pick their preferred provider. Fallbacks happen automatically. We know exactly what every AI call costs per business, per step, per provider. Our AI spend dropped 40% without sacrificing quality.

The Problem: One Provider Isn’t Enough

When you’re generating entire websites with AI—research, content, images, domain names, chatbot training—you make a lot of API calls. At our scale, a single provider is a liability:

OpenAI goes down → Every website generation fails
Costs spike → No alternative routing available
Quality varies → GPT-4o is great at content but expensive for simple tasks
Image generation → No single model is best at everything

We needed a system that could route between providers intelligently, fall back gracefully, and—critically—tell us what everything costs.

The Architecture

Provider Abstraction Layer

Every AI call in our system goes through a unified client:

export const getAIClient = (options?: {
  stepId?: string;
  businessId?: string;
  providerOverride?: AIProvider;
}) => {
  const provider = options?.providerOverride || getAIProvider();

  switch (provider) {
    case 'anthropic': return { client: getAnthropic(options), provider };
    case 'deepseek': return { client: getDeepSeek(options), provider };
    default: return { client: getOpenAI(options), provider };
  }
};

The key insight: providerOverride lets users choose their preferred AI. The system default is configurable per-environment. And every call carries stepId and businessId metadata for tracking.

Automatic Fallbacks

When a provider fails, we don’t just retry—we route to an alternate:

// Fallback chain: deepseek → openai → anthropic → openai
const fallbackMap = {
  deepseek: 'openai',
  openai: 'anthropic',
  anthropic: 'openai',
};

This saved us during two DeepSeek outages in January. Zero user-facing failures.

Image Provider Routing

For image generation, we run three providers with different strengths:

Provider	Best For	Cost
Fal.ai FLUX.2	General stock photos, text rendering	~$0.02/image
Gemini Imagen 4	Photorealistic scenes, lifestyle shots	~$0.04/image
OpenAI DALL-E 3	Creative/artistic imagery	~$0.04/image

export const getImageProvider = (): ImageProvider => {
  const provider = process.env.IMAGE_PROVIDER || heliconeConfig.defaultImageProvider;

  if (provider === 'fal' && isFalConfigured()) return 'fal';
  if (provider === 'gemini' && isGeminiConfigured()) return 'gemini';
  return 'openai'; // Always available as fallback
};

Vision Capabilities Across Providers

Not every provider supports vision (analyzing images). DeepSeek’s standard API doesn’t. So we built smart routing:

const needsVisionFallback = currentProvider === 'deepseek';
const useFalForVision = needsVisionFallback && isFalAvailable();

// Route to Fal.ai Moondream for vision if DeepSeek is primary
const effectiveVisionProvider = useFalForVision
  ? 'fal'
  : (needsVisionFallback ? 'openai' : currentProvider);

This means users on DeepSeek (cheapest provider) still get full vision capabilities through Fal.ai’s Moondream model.

Helicone: The Missing Piece

Running 5 AI providers without observability is flying blind. Helicone gives us a single dashboard for all providers.

How We Integrated It

Every provider routes through Helicone’s gateway:

// Each provider has a Helicone gateway URL
const HELICONE_GATEWAYS = {
  openai: 'https://oai.helicone.ai/v1',
  anthropic: 'https://anthropic.helicone.ai',
  deepseek: 'https://deepseek.helicone.ai',
  gemini: 'https://gateway.helicone.ai',
  fal: 'https://gateway.helicone.ai',
};

Metadata We Track

Every request carries structured metadata:

const headers = {
  'Helicone-Auth': `Bearer ${apiKey}`,
  'Helicone-Property-step': stepId,        // Which generation step
  'Helicone-Property-businessId': businessId, // Which business
  'Helicone-Property-environment': env,      // dev/staging/prod
};

This means we can answer questions like:

“How much does the research step cost per business?”
“Which provider is cheapest for content generation?”
“What’s our total AI spend this week, broken down by step?”

Cost Reality Check

Here’s what we learned from Helicone data:

Model	Input ($/1M tokens)	Output ($/1M tokens)
DeepSeek Chat	$0.28	$0.42
GPT-4o-mini	$0.15	$0.60
GPT-4o	$2.50	$10.00
Claude 3.5 Sonnet	$3.00	$15.00
DeepSeek Reasoner	$0.55	$2.19

DeepSeek is 10x cheaper than GPT-4o for comparable quality on most of our tasks. We route simple content generation to DeepSeek and reserve GPT-4o/Claude for complex reasoning steps (strategy, competitive analysis).

Capacity-Based Model Selection

Not every step needs the smartest model. We built a capacity system:

const STEP_MODEL_CONFIG = {
  research: { capacity: 'default' },     // DeepSeek Chat is fine
  strategy: { capacity: 'high' },        // Needs advanced reasoning
  logo: { capacity: 'default' },
  heroImage: { capacity: 'default' },
  layout: { capacity: 'default' },
  assembly: { capacity: 'default' },
};

// High-capacity models (for complex reasoning)
const HIGH_CAPACITY = {
  openai: 'gpt-5',
  anthropic: 'claude-opus-4-0',
  deepseek: 'deepseek-reasoner',
};

Results

After 3 weeks of running this system:

40% cost reduction vs. GPT-4o-only baseline
Zero downtime from provider outages (fallbacks caught 12 incidents)
Per-business cost visibility → we know exactly what each website generation costs
User choice → power users can pick their preferred AI provider

What We’d Do Differently

Start with observability. We added Helicone after building multi-provider routing. Should have been day one.
Test vision capabilities early. We discovered DeepSeek’s vision limitations in production.
Cache aggressively. Same prompts hit different providers during fallbacks. Caching identical requests saves money.

Try It

Every website generated on WebZum uses this multi-model system. The AI picks the best provider for each step, falls back automatically on failures, and we track every token. Your business website gets enterprise-grade AI infrastructure without enterprise-grade pricing.

The Problem: One Provider Isn’t Enough

When you’re generating entire websites with AI—research, content, images, domain names, chatbot training—you make a lot of API calls. At our scale, a single provider is a liability:

OpenAI goes down → Every website generation fails
Costs spike → No alternative routing available
Quality varies → GPT-4o is great at content but expensive for simple tasks
Image generation → No single model is best at everything

We needed a system that could route between providers intelligently, fall back gracefully, and—critically—tell us what everything costs.

The Architecture

Provider Abstraction Layer

Every AI call in our system goes through a unified client:

export const getAIClient = (options?: {
  stepId?: string;
  businessId?: string;
  providerOverride?: AIProvider;
}) => {
  const provider = options?.providerOverride || getAIProvider();

  switch (provider) {
    case 'anthropic': return { client: getAnthropic(options), provider };
    case 'deepseek': return { client: getDeepSeek(options), provider };
    default: return { client: getOpenAI(options), provider };
  }
};

The key insight: providerOverride lets users choose their preferred AI. The system default is configurable per-environment. And every call carries stepId and businessId metadata for tracking.

Automatic Fallbacks

When a provider fails, we don’t just retry—we route to an alternate:

// Fallback chain: deepseek → openai → anthropic → openai
const fallbackMap = {
  deepseek: 'openai',
  openai: 'anthropic',
  anthropic: 'openai',
};

This saved us during two DeepSeek outages in January. Zero user-facing failures.

Image Provider Routing

For image generation, we run three providers with different strengths:

Provider	Best For	Cost
Fal.ai FLUX.2	General stock photos, text rendering	~$0.02/image
Gemini Imagen 4	Photorealistic scenes, lifestyle shots	~$0.04/image
OpenAI DALL-E 3	Creative/artistic imagery	~$0.04/image

export const getImageProvider = (): ImageProvider => {
  const provider = process.env.IMAGE_PROVIDER || heliconeConfig.defaultImageProvider;

  if (provider === 'fal' && isFalConfigured()) return 'fal';
  if (provider === 'gemini' && isGeminiConfigured()) return 'gemini';
  return 'openai'; // Always available as fallback
};

Vision Capabilities Across Providers

Not every provider supports vision (analyzing images). DeepSeek’s standard API doesn’t. So we built smart routing:

const needsVisionFallback = currentProvider === 'deepseek';
const useFalForVision = needsVisionFallback && isFalAvailable();

// Route to Fal.ai Moondream for vision if DeepSeek is primary
const effectiveVisionProvider = useFalForVision
  ? 'fal'
  : (needsVisionFallback ? 'openai' : currentProvider);

This means users on DeepSeek (cheapest provider) still get full vision capabilities through Fal.ai’s Moondream model.

Helicone: The Missing Piece

Running 5 AI providers without observability is flying blind. Helicone gives us a single dashboard for all providers.

How We Integrated It

Every provider routes through Helicone’s gateway:

// Each provider has a Helicone gateway URL
const HELICONE_GATEWAYS = {
  openai: 'https://oai.helicone.ai/v1',
  anthropic: 'https://anthropic.helicone.ai',
  deepseek: 'https://deepseek.helicone.ai',
  gemini: 'https://gateway.helicone.ai',
  fal: 'https://gateway.helicone.ai',
};

Metadata We Track

Every request carries structured metadata:

const headers = {
  'Helicone-Auth': `Bearer ${apiKey}`,
  'Helicone-Property-step': stepId,        // Which generation step
  'Helicone-Property-businessId': businessId, // Which business
  'Helicone-Property-environment': env,      // dev/staging/prod
};

This means we can answer questions like:

“How much does the research step cost per business?”
“Which provider is cheapest for content generation?”
“What’s our total AI spend this week, broken down by step?”

Cost Reality Check

Here’s what we learned from Helicone data:

Model	Input ($/1M tokens)	Output ($/1M tokens)
DeepSeek Chat	$0.28	$0.42
GPT-4o-mini	$0.15	$0.60
GPT-4o	$2.50	$10.00
Claude 3.5 Sonnet	$3.00	$15.00
DeepSeek Reasoner	$0.55	$2.19

Capacity-Based Model Selection

Not every step needs the smartest model. We built a capacity system:

const STEP_MODEL_CONFIG = {
  research: { capacity: 'default' },     // DeepSeek Chat is fine
  strategy: { capacity: 'high' },        // Needs advanced reasoning
  logo: { capacity: 'default' },
  heroImage: { capacity: 'default' },
  layout: { capacity: 'default' },
  assembly: { capacity: 'default' },
};

// High-capacity models (for complex reasoning)
const HIGH_CAPACITY = {
  openai: 'gpt-5',
  anthropic: 'claude-opus-4-0',
  deepseek: 'deepseek-reasoner',
};

Results

After 3 weeks of running this system:

40% cost reduction vs. GPT-4o-only baseline
Zero downtime from provider outages (fallbacks caught 12 incidents)
Per-business cost visibility → we know exactly what each website generation costs
User choice → power users can pick their preferred AI provider

What We’d Do Differently

Start with observability. We added Helicone after building multi-provider routing. Should have been day one.
Test vision capabilities early. We discovered DeepSeek’s vision limitations in production.
Cache aggressively. Same prompts hit different providers during fallbacks. Caching identical requests saves money.

WebZum

We Run 5 AI Providers in Production (And Finally Know What They Cost)

The Problem: One Provider Isn’t Enough

The Architecture

Provider Abstraction Layer

Automatic Fallbacks

Image Provider Routing

Vision Capabilities Across Providers

Helicone: The Missing Piece

How We Integrated It

Metadata We Track

Cost Reality Check

Capacity-Based Model Selection

Results

What We’d Do Differently

Try It

Ready to Build Your Website?

WebZum

We Run 5 AI Providers in Production (And Finally Know What They Cost)

The Problem: One Provider Isn’t Enough

The Architecture

Provider Abstraction Layer

Automatic Fallbacks

Image Provider Routing

Vision Capabilities Across Providers

Helicone: The Missing Piece

How We Integrated It

Metadata We Track

Cost Reality Check

Capacity-Based Model Selection

Results

What We’d Do Differently

Try It

Ready to Build Your Website?