WebZum Logo
WebZum

From Zero to Website Hero

Sign InSign Up
Back to Blog
aiinfrastructureobservabilitydeepseekstartup

We Run 5 AI Providers in Production (And Finally Know What They Cost)

WebZum Engineering•January 10, 2026•5 min read
We Run 5 AI Providers in Production (And Finally Know What They Cost)

TL;DR: We integrated DeepSeek, Fal.ai (FLUX.2), Google Imagen 4, OpenAI, and Anthropic into a single routing layer with Helicone observability. Users can pick their preferred provider. Fallbacks happen automatically. We know exactly what every AI call costs per business, per step, per provider. Our AI spend dropped 40% without sacrificing quality.

The Problem: One Provider Isn’t Enough

When you’re generating entire websites with AI—research, content, images, domain names, chatbot training—you make a lot of API calls. At our scale, a single provider is a liability:

  • OpenAI goes down → Every website generation fails
  • Costs spike → No alternative routing available
  • Quality varies → GPT-4o is great at content but expensive for simple tasks
  • Image generation → No single model is best at everything

We needed a system that could route between providers intelligently, fall back gracefully, and—critically—tell us what everything costs.

The Architecture

Provider Abstraction Layer

Every AI call in our system goes through a unified client:

export const getAIClient = (options?: {
  stepId?: string;
  businessId?: string;
  providerOverride?: AIProvider;
}) => {
  const provider = options?.providerOverride || getAIProvider();

  switch (provider) {
    case 'anthropic': return { client: getAnthropic(options), provider };
    case 'deepseek': return { client: getDeepSeek(options), provider };
    default: return { client: getOpenAI(options), provider };
  }
};

The key insight: providerOverride lets users choose their preferred AI. The system default is configurable per-environment. And every call carries stepId and businessId metadata for tracking.

Automatic Fallbacks

When a provider fails, we don’t just retry—we route to an alternate:

// Fallback chain: deepseek → openai → anthropic → openai
const fallbackMap = {
  deepseek: 'openai',
  openai: 'anthropic',
  anthropic: 'openai',
};

This saved us during two DeepSeek outages in January. Zero user-facing failures.

Image Provider Routing

For image generation, we run three providers with different strengths:

Provider Best For Cost
Fal.ai FLUX.2 General stock photos, text rendering ~$0.02/image
Gemini Imagen 4 Photorealistic scenes, lifestyle shots ~$0.04/image
OpenAI DALL-E 3 Creative/artistic imagery ~$0.04/image
export const getImageProvider = (): ImageProvider => {
  const provider = process.env.IMAGE_PROVIDER || heliconeConfig.defaultImageProvider;

  if (provider === 'fal' && isFalConfigured()) return 'fal';
  if (provider === 'gemini' && isGeminiConfigured()) return 'gemini';
  return 'openai'; // Always available as fallback
};

Vision Capabilities Across Providers

Not every provider supports vision (analyzing images). DeepSeek’s standard API doesn’t. So we built smart routing:

const needsVisionFallback = currentProvider === 'deepseek';
const useFalForVision = needsVisionFallback && isFalAvailable();

// Route to Fal.ai Moondream for vision if DeepSeek is primary
const effectiveVisionProvider = useFalForVision
  ? 'fal'
  : (needsVisionFallback ? 'openai' : currentProvider);

This means users on DeepSeek (cheapest provider) still get full vision capabilities through Fal.ai’s Moondream model.

Helicone: The Missing Piece

Running 5 AI providers without observability is flying blind. Helicone gives us a single dashboard for all providers.

How We Integrated It

Every provider routes through Helicone’s gateway:

// Each provider has a Helicone gateway URL
const HELICONE_GATEWAYS = {
  openai: 'https://oai.helicone.ai/v1',
  anthropic: 'https://anthropic.helicone.ai',
  deepseek: 'https://deepseek.helicone.ai',
  gemini: 'https://gateway.helicone.ai',
  fal: 'https://gateway.helicone.ai',
};

Metadata We Track

Every request carries structured metadata:

const headers = {
  'Helicone-Auth': `Bearer ${apiKey}`,
  'Helicone-Property-step': stepId,        // Which generation step
  'Helicone-Property-businessId': businessId, // Which business
  'Helicone-Property-environment': env,      // dev/staging/prod
};

This means we can answer questions like:

  • “How much does the research step cost per business?”
  • “Which provider is cheapest for content generation?”
  • “What’s our total AI spend this week, broken down by step?”

Cost Reality Check

Here’s what we learned from Helicone data:

Model Input ($/1M tokens) Output ($/1M tokens)
DeepSeek Chat $0.28 $0.42
GPT-4o-mini $0.15 $0.60
GPT-4o $2.50 $10.00
Claude 3.5 Sonnet $3.00 $15.00
DeepSeek Reasoner $0.55 $2.19

DeepSeek is 10x cheaper than GPT-4o for comparable quality on most of our tasks. We route simple content generation to DeepSeek and reserve GPT-4o/Claude for complex reasoning steps (strategy, competitive analysis).

Capacity-Based Model Selection

Not every step needs the smartest model. We built a capacity system:

const STEP_MODEL_CONFIG = {
  research: { capacity: 'default' },     // DeepSeek Chat is fine
  strategy: { capacity: 'high' },        // Needs advanced reasoning
  logo: { capacity: 'default' },
  heroImage: { capacity: 'default' },
  layout: { capacity: 'default' },
  assembly: { capacity: 'default' },
};

// High-capacity models (for complex reasoning)
const HIGH_CAPACITY = {
  openai: 'gpt-5',
  anthropic: 'claude-opus-4-0',
  deepseek: 'deepseek-reasoner',
};

Results

After 3 weeks of running this system:

  • 40% cost reduction vs. GPT-4o-only baseline
  • Zero downtime from provider outages (fallbacks caught 12 incidents)
  • Per-business cost visibility → we know exactly what each website generation costs
  • User choice → power users can pick their preferred AI provider

What We’d Do Differently

  1. Start with observability. We added Helicone after building multi-provider routing. Should have been day one.
  2. Test vision capabilities early. We discovered DeepSeek’s vision limitations in production.
  3. Cache aggressively. Same prompts hit different providers during fallbacks. Caching identical requests saves money.

Try It

Every website generated on WebZum uses this multi-model system. The AI picks the best provider for each step, falls back automatically on failures, and we track every token. Your business website gets enterprise-grade AI infrastructure without enterprise-grade pricing.

Ready to Build Your Website?

Join hundreds of businesses using WebZum to create professional websites in minutes, not weeks.

Get Started Free
Live in 5 minutesNo credit card required
Home•Free Tools•Blog•Directory•About•Agencies•Partners
FAQ•Privacy•Terms•© 2026 WebZum