We Track Every AI Token. Here's What It Actually Costs to Generate a Website.
TL;DR: We built a cost tracking system that captures every AI API call in our website generation pipeline—LLMs, image generation, web search—and writes a cost record to DynamoDB. We know exactly what each website costs, broken down by step, provider, and model. The system is fire-and-forget (never blocks generation), supports time-versioned pricing, and auto-expires dev records after 90 days. Average website: under $0.50 in AI costs.
Why Track AI Costs at the Token Level?
When you’re making dozens of AI calls to generate a single website, costs add up in unexpected places. Without granular tracking, you’re flying blind:
- Is the strategy step or the content step more expensive?
- Did switching from GPT-4o to DeepSeek actually save money?
- Which businesses cost the most to generate? (Hint: restaurants with long menus.)
- Is our image generation spend growing faster than our LLM spend?
We needed answers at the per-call, per-step, per-business level. Helicone gives us provider-level observability, but we wanted cost data inside our own database—queryable, aggregatable, and tied to our business logic.
The Architecture
Fire-and-Forget Logger
The core design principle: cost logging never blocks website generation. If DynamoDB is slow or down, generation continues. The logger catches errors silently.
export function logCost(entry: CostLogEntry): void {
_writeRecord(entry).catch((err) => {
console.warn('⚠️ [COST] Failed to log cost:', err?.message ?? err);
});
}
No await. No error propagation. The caller moves on immediately.
What Gets Logged
Every cost record captures:
| Field | Example |
|---|---|
businessId |
"joes-pizza-brooklyn" |
stepId |
"strategy", "webSearch", "google-image-search" |
provider |
"openai", "anthropic", "deepseek", "fal" |
model |
"gpt-4o", "claude-opus-4-5", "flux-2" |
usage |
{ type: "token", inputTokens: 4200, outputTokens: 1800 } |
cost |
0.028 (calculated USD) |
durationMs |
3400 |
environment |
"production" |
Four Pricing Models
Not all AI calls are billed the same way. We support four pricing types in a single JSON config:
{
"gpt-4o": [{
"effectiveDate": "2026-01-01",
"inputPer1M": 2.50,
"outputPer1M": 10.00
}],
"fal-ai/flux-2": [{
"effectiveDate": "2026-01-01",
"pricePerMP": 0.012,
"minimumMP": 1.0
}],
"gpt-image-1.5": [{
"effectiveDate": "2026-01-01",
"pricePerImage": 0.02
}],
"google-custom-search": [{
"effectiveDate": "2026-01-01",
"pricePerQuery": 0.005
}]
}
- Token pricing: LLMs (input + output tokens)
- Megapixel pricing: Fal.ai FLUX (image dimensions matter)
- Per-image pricing: OpenAI image generation (flat rate)
- Per-query pricing: Google Custom Search (CSE)
Time-Versioned Pricing
Prices change. When OpenAI drops GPT-4o pricing, we add a new entry with a future effectiveDate. The lookup uses the newest price that’s <= the request date:
// Returns the pricing entry active on the given date
const entries = pricing[model]
.filter(e => e.effectiveDate <= dateString)
.sort((a, b) => b.effectiveDate.localeCompare(a.effectiveDate));
return entries[0];
Historical cost records stay accurate because they were calculated with the pricing active at the time.
Integration: Zero-Touch for Developers
Developers writing new pipeline steps don’t need to think about cost tracking. It’s wired into our AI abstraction layer:
function createUsageLogger(
provider: string,
model: string,
businessId?: string,
stepId?: string,
startTime?: number
) {
return (usage: { inputTokens: number; outputTokens: number }) => {
logCost({
businessId: businessId || 'unknown',
stepId,
provider,
model,
usage: { type: 'token', ...usage },
durationMs: startTime ? Date.now() - startTime : undefined,
});
};
}
Every call to generateContent() or generateStructuredData() automatically creates a usage callback. When the provider returns token counts, the callback fires and logs the cost. No manual instrumentation needed.
DynamoDB Schema
We use a dual-access pattern:
By Business (primary key):
PK = "BIZ#joes-pizza-brooklyn",SK = "2026-03-03T14:22:00Z#uuid"- Query: “Show me all AI costs for this business”
By Date (GSI):
GSI1PK = "DATE#2026-03-03",GSI1SK = "2026-03-03T14:22:00Z#uuid"- Query: “Show me all AI costs from today”
Non-production records get a 90-day TTL. Production records persist indefinitely.
The Numbers
Here’s what we’ve learned from tracking every token:
Cost Per Website (Average)
| Step | Avg Cost | Provider |
|---|---|---|
| Enrichment (web search) | $0.08 | OpenAI |
| Strategy | $0.04 | DeepSeek Reasoner |
| Brand & Logo | $0.03 | DeepSeek + Fal.ai |
| Image Generation | $0.08 | Fal.ai FLUX |
| Content Sections (6-8) | $0.06 | DeepSeek Chat |
| Header + Footer | $0.02 | DeepSeek Chat |
| Assembly + SEO | $0.03 | DeepSeek Chat |
| Total | ~$0.34 |
Under $0.50 per website. At our $19/month price point, the unit economics work from day one.
Where the Money Goes
The two biggest cost drivers:
- Web search enrichment (24%) — OpenAI’s search is priced per-query, not per-token
- Image generation (24%) — We generate hero images, section images, and logos
LLM text generation is surprisingly cheap, especially with DeepSeek routing for standard content.
Model Comparison in Strategy Evaluation
We evaluate strategy generation across four models and track cost alongside quality:
const inputTokens = Math.ceil(promptLength / 4); // ~4 chars/token for English
const outputTokens = Math.ceil(outputJson.length / 4);
const costUSD = calculateCost(model.model, {
type: 'token',
inputTokens,
outputTokens,
});
| Model | Avg Score (1-10) | Avg Cost | Speed |
|---|---|---|---|
| DeepSeek Chat | 7.2 | $0.002 | 4.1s |
| DeepSeek Reasoner | 8.1 | $0.008 | 8.3s |
| Haiku 4.5 | 7.8 | $0.005 | 3.2s |
| Opus 4.5 | 8.4 | $0.045 | 6.7s |
Opus 4.5 scores highest but costs 22x more than DeepSeek Chat. For most businesses, DeepSeek Reasoner hits the sweet spot: 95% of Opus quality at 18% of the cost.
Admin Dashboard
We built an admin API that aggregates costs by business, provider, and step:
GET /api/admin/costs?from=2026-03-01&to=2026-03-03&limit=5000
Response includes total spend, per-business breakdowns, per-provider splits, and per-step attribution. We review this weekly to catch cost anomalies—like when a prompt change accidentally doubled our token usage on the assembly step.
What We’d Do Differently
-
Track actual tokens, not estimates. Our evaluation scripts use
promptLength / 4for token estimation. In production, we use actual counts from provider responses. The evaluation scripts should too. -
Add cost alerts. We track costs but don’t alert when a single business costs 10x the average. We’ve seen outliers (businesses with massive menus or complex service lists) that we only discovered during weekly reviews.
-
Cache costs per business. Right now, regenerating a website logs new costs without referencing previous generation costs. We should show users: “This regeneration cost $0.42, your last generation cost $0.38.”
The Takeaway
If you’re building an AI product with multiple providers and multiple call types, instrument cost tracking from day one. Not tomorrow. Not after launch. Day one.
The data pays for itself immediately:
- Model selection: Hard data on cost-per-quality, not vibes
- Regression detection: Prompt changes that double costs show up instantly
- Unit economics: Know your margins at the individual-customer level
- Provider negotiation: Show your OpenAI rep exactly how many tokens you burn monthly
Try It
Every website generated on WebZum has its costs tracked at the token level. The AI picks the most cost-effective provider for each step, and we know exactly what your website cost us to build. That’s how we keep pricing at $19/month without cutting corners.