LLM Router

Route each request to the right model for the job. Simple tasks go to cheaper models. Complex reasoning stays on your frontier model.

Get Started Free View Docs

Every request uses your most expensive model

The Problem

Your coding assistant sends every request to the same frontier model, whether it's designing system architecture or reading a config file. 70% of requests in a typical session are simple tasks that a model costing 85% less would handle identically.

The Solution

Tokonomy classifies each request in real time and routes it to the optimal model. A two-tier classifier uses fast pattern matching for ~70% of requests with zero added latency, and an LLM fallback for ambiguous cases. Simple tasks go to cheaper models. Complex reasoning stays on your frontier model.

Before and After

Without Routing
read_file → Sonnet ($3/M input) git status → Sonnet ($3/M input) design architecture → Sonnet ($3/M input) format code → Sonnet ($3/M input) All tasks use the same expensive model
With Smart Routing
read_file → Haiku ($0.80/M input) git status → Haiku ($0.80/M input) design architecture → Sonnet ($3/M input) format code → Haiku ($0.80/M input) 70% of tasks route to 85% cheaper models
Token savings: 60-70%

How It Works

1

A request arrives at the proxy

2

The classifier evaluates the request: is this a simple task or complex reasoning?

3

Simple tasks route to the cheapest model tier within the same provider

4

Complex tasks stay on your requested frontier model. The response format is unchanged

Frequently Asked Questions

What if the classifier gets it wrong?
When uncertain, the classifier defaults to keeping the request on the frontier model. This is the safe default: you might miss a savings opportunity, but you never degrade output quality.
Can I route across different providers?
Yes. Cross-provider routing sends requests to the cheapest provider for that task type. A Claude request asking a simple question can transparently route to GPT-4o-mini.
Does routing work with streaming?
Same-provider routing works with streaming. Cross-provider routing requires non-streaming requests due to format conversion.
What counts as a 'simple' task?
File reads, git operations, doc lookups, formatting, package management, error code queries, and similar janitorial sub-agent tasks.

Related Tools

Ready to start saving?

Create an account, add your first app, and swap one URL. Takes about 5 minutes.

Get Started Free