Calculator
RAG Inference Cost
calculator.
Estimate monthly inference cost for a RAG system in production.
How we calibrated this
Used to model client RAG TCO before architecture decisions.
Inputs
Tell us about your project.
This is a static reference card. For interactive calculators, talk to us — we tune the assumptions per client.
Queries per month
Range: 1000–5000000 queries · Default: 50000 queries
Avg input tokens per query
Range: 500–20000 tokens · Default: 4000 tokens
Avg output tokens per query
Range: 100–4000 tokens · Default: 600 tokens
Model
- Claude Haiku 4.50.2×
- Claude Sonnet 4.61×
- Claude Opus 4.74.5×
- GPT mini0.25×
- Self-hosted fine-tuned 7B0.05×
How it's calculated
The formula.
Tokens × per-token model price + retrieval costs
Output
Monthly inference cost
API + retrieval cost per month.
Output
Cost per query
Effective unit economics.
Output
Annual run-rate
12-month projection.
Want a real estimate?
This is a band,
not a quote.
For a real estimate calibrated to your specific project, brief us. We get back within two business days.
Brief us on RAG