LLM Cost Math: OpenAI vs. Anthropic vs. Local Models

LLM pricing models decoded: tokens, context, fine-tuning

Understanding LLM costs starts with tokens. Tokens are chunks of text (≈4 chars or ¾ of a word). Pricing is usually measured in $ per 1,000 tokens.

Three main factors drive costs:

  1. Input tokens: What you send to the model (prompt + context).
  2. Output tokens: What the model generates.
  3. Context length: Longer context = more tokens = higher cost.

Additional pricing dimensions:

  • Function calling: Structured outputs can increase tokens consumed.
  • Embeddings: Needed for search, RAG, and classification.
  • Fine-tuning: One-time training + higher per-token cost afterward.

Bottom line: total cost = input tokens + output tokens × price per 1k tokens.


OpenAI pricing breakdown with real examples

OpenAI provides both low-cost models (GPT-3.5) and premium models (GPT-4).

llm cost math 01

GPT-4 vs GPT-3.5: when the price difference is worth it

  • GPT-3.5 Turbo (16k context): $0.50 / 1M input tokens, $1.50 / 1M output tokens.
  • GPT-4o (128k context): $2.50 / 1M input tokens, $10 / 1M output tokens.
  • GPT-4 Turbo Vision adds multimodal, but pricing scales similarly.

Example (Customer Support):

  • Prompt: 500 tokens (input), Response: 200 tokens (output).
  • Cost per conversation:
    • GPT-3.5: (500×$0.50 + 200×$1.50)/1M = ~$0.00055
    • GPT-4o: (500×$2.50 + 200×$10)/1M = ~$0.0035

GPT-4 is ~6× more expensive per conversation but provides more reliable reasoning. Worth it if accuracy = fewer escalations.

Hidden costs: function calling, embeddings, moderation

  • Function calling: Adds extra tokens for structured JSON output.
  • Embeddings: text-embedding-3-small costs $0.02 per 1M tokens. Cheap but adds up for large datasets.
  • Moderation API: Free but counts toward token usage in pipelines.

Anthropic Claude: pricing and positioning vs OpenAI

Anthropic positions Claude 3 models as safer and more “steerable.”

  • Claude 3 Haiku (200k context): $0.25 input / $1.25 output per 1M tokens.
  • Claude 3 Sonnet (200k context): $3 input / $15 output.
  • Claude 3 Opus (200k context): $15 input / $75 output.

Compared to OpenAI:

  • Haiku ≈ GPT-3.5 in cost.
  • Sonnet ≈ GPT-4 Turbo in cost/performance.
  • Opus ≈ premium GPT-4 for heavy reasoning tasks.

Claude’s 200k context window is a differentiator, making it ideal for legal, research, or document-heavy workflows.


Local models: infrastructure costs vs APIs

Running LLMs locally (or self-hosted in the cloud) avoids per-token API fees but shifts costs to hardware + ops.

Hardware requirements for Llama 2/3, Code Llama

  • Llama-2 7B: Needs ~16GB GPU VRAM.
  • Llama-2 13B: Needs ~24–32GB VRAM.
  • Llama-2 70B: Needs ~4×80GB GPUs (A100s or H100s).
  • Code Llama models follow similar patterns.

For small teams, only 7B–13B models are practical on single GPUs.

Cost breakdown: GPU cloud vs on-premise

  • Cloud GPU (A100 80GB): ~$2–3/hour → ~$1,500–2,000/month if always on.
  • On-prem A100/H100 servers: $15k–$30k per card upfront, plus power + cooling.
  • Optimized hosting (Lambda Labs, RunPod, Modal): Pay-per-use, but still $0.50–$2/hour depending on GPU.

Rule of thumb: Local models are cheaper only if you run consistently at scale. For ad-hoc tasks, API calls remain more cost-efficient.

llm cost math 02

Practical calculator: typical use cases

Customer support chatbot (1000 conversations/day)

  • Avg conversation: 5 prompts × 700 tokens (500 in, 200 out).
  • Daily tokens: 3.5M in + 1.4M out.

Costs/month:

  • GPT-3.5: ~$75
  • GPT-4o: ~$480
  • Claude 3 Haiku: ~$50
  • Local 13B (cloud GPU): ~$600+ infra

Content generation (100 articles/month)

  • Avg article: 1500 input tokens + 1200 output tokens.
  • Total/month: 150k in + 120k out.

Costs/month:

  • GPT-3.5: <$1
  • GPT-4o: ~$2.50
  • Claude Sonnet: ~$3.60
  • Local 13B: negligible per-run but infra cost applies.

Code assistance (10-person dev team)

  • Avg dev: 50 prompts/day × 800 tokens (600 in, 200 out).
  • Monthly total: ~12M in + 4M out.

Costs/month:

  • GPT-3.5: ~$22
  • GPT-4o: ~$140
  • Claude Sonnet: ~$170
  • Local GPU (cloud): ~$1,500

Cost optimization strategies

  • Mix models: Use GPT-3.5/Claude Haiku for easy queries; upgrade to GPT-4/Claude Opus for hard cases.
  • Cache responses: Store frequent answers to cut API calls.
  • Tune context length: Don’t send 10k tokens if 1k is enough.
  • Batch embeddings: Lower per-call costs by chunking efficiently.
  • Hybrid pipeline: Retrieval + smaller model for recall, larger model only for synthesis.

Hybrid approaches: when to combine APIs and local

  • APIs for high-quality reasoning, customer-facing accuracy.
  • Local models for private data, cost control, or continuous workloads.
  • Best of both: Local embeddings + API generation; or local small model + API fallback for tough queries.

ROI analysis: how much AI can justify in your budget

Rule of thumb: AI should save or earn at least 5× its cost.

  • A $500/month chatbot is justified if it saves 20+ support hours.
  • A $150/month code assistant is justified if it accelerates developer output by even 5%.
  • Running local GPUs at $2k/month only makes sense if workload is consistent and mission-critical.

Without clear ROI, cheaper API-first strategies are safer for small to mid-size teams.

  • OpenAI: Strong balance of cost and ecosystem. GPT-3.5 is extremely cheap for most workloads.
  • Anthropic Claude: Best for large-context use cases and safety-sensitive tasks.
  • Local models: Only cost-effective if workloads are massive and continuous.

Careful math + ROI framing is essential. Many teams overpay because they don’t measure tokens or underutilize smaller models.

FAQs

Is GPT-4 always worth it?
No. For FAQs and simple tasks, GPT-3.5 or Claude Haiku are cheaper and good enough.

Do local models save money?
Only if you run them 24/7 at scale. Cloud GPU costs add up quickly.

Which provider is best for long documents?
Claude (200k context window).