The 14-Dimension Classifier That Cuts LLM Costs by 73%
If you are building LLM applications in production, you have likely hit the same wall I did: the API bill.
When you first build an app, you default to the smartest model available, usually Claude 3.5 Sonnet or Opus. It makes sense for the prototype. You want the highest probability of success. But as traffic scales, you realise you are using a supercomputer to do basic arithmetic.
You are paying Opus prices to parse JSON, format strings, or answer simple FAQ questions that Haiku could handle at 1/50th of the cost.
This is why I built ModelMesh, a dynamic router that evaluates incoming prompts across 14 dimensions and routes them to the most cost-efficient capable model. In its first week of deployment across my projects, it cut my total API costs by exactly 73.4%.
Here's how it works under the hood.
The Problem with Static Routing
The naive approach to cost optimization is static routing. You hardcode certain endpoints to use cheaper models.
/api/summarize gets Haiku. /api/generate-code gets Opus.
Static routing fails because user intent is fluid. A user might paste a 10,000-line codebase into the summarize endpoint, causing Haiku to hallucinate wildly.
We needed a system that evaluated the complexity of the prompt itself, not just the endpoint it originated from.
The 14-Dimension Scorer
Rather than relying on just token length as a proxy for complexity, ModelMesh runs a fast, local classification step before hitting any external API.
It evaluates the prompt across 14 specific dimensions:
- Reasoning Depth: Does it require multi-step logic?
- Code Proximity: Does it ask for code generation or debugging?
- Context Window: Total token count of input.
- Instruction Density: Number of distinct commands/constraints.
- Language Nuance: Does it require specific tone playing or creative writing?
- ... (and 9 others)
The Local Classifier
To keep latency low (routing must take < 50ms), the classifier does not use an LLM. It uses a combination of regex heuristics and a lightweight traditional ML model (trained on a dataset of 5,000 prompts categorized by complexity).
The scoring logic, simplified:
export function calculatePromptComplexity(prompt: string): ScoreProfile {
let score = 0;
const signals = [];
// 1. Check for code generation requests
if (/(func|class|def|implement|write a script|debug)/i.test(prompt)) {
score += 0.3;
signals.push("code_generation");
}
// 2. Evaluate constraint density
const constraintCount = (prompt.match(/(must|should|require|strictly|only|do not)/gi) || []).length;
if (constraintCount > 3) {
score += (constraintCount * 0.05);
signals.push("high_constraint_density");
}
// 3. Multi-step reasoning triggers
if (/(step by step|first|then|finally|evaluate options)/i.test(prompt)) {
score += 0.25;
signals.push("multi_step_reasoning");
}
// Normalize to 0.0 - 1.0
const normalizedScore = Math.min(Math.max(score, 0), 1);
const tier = determineTier(normalizedScore);
return { score: normalizedScore, tier, signals };
}The Routing Matrix
Once we have a complexity score (0.0 to 1.0), we map it against our available models.
| Complexity Score | Assigned Model | Use Case |
|---|---|---|
0.0 - 0.3 | Claude 3 Haiku | Summarization, parsing, formatting |
0.3 - 0.7 | Claude 3.5 Sonnet | General content, basic coding, standard chat |
0.7 - 1.0 | Claude 3 Opus | Complex architecture, deep debugging, dense constraints |
By defaulting to the cheapest competent model, the vast majority of our traffic naturally shifted to Haiku and Sonnet, saving the expensive Opus calls for tasks that actually needed them.
The Results
Before ModelMesh, a typical week on my portfolio apps looked like this:
- Opus: 85% of traffic
- Sonnet: 15% of traffic
- Haiku: 0%
After implementing the 14-dimension classifier:
- Opus: 12% of traffic
- Sonnet: 45% of traffic
- Haiku: 43% of traffic
The quality of responses remained indistinguishable to users, but the infrastructure costs plummeted by 73%.
TypeScript ClassificationYou can try the interactive demo of the classifier locally on the ModelMesh Project Page. It runs entirely in your browser.
If you are paying too much for AI, you don't need a better model. You need a better router.