Intelligence·Apr 14, 2026·8 min read

Why Small Models Are Quietly Eating Large Ones

Frontier capability is converging. Cost and latency are not. The economics now favor specialization.

SJ
Sarah JenkinsContributor, The Signal

The capability gap between the largest frontier models and well-tuned 8B-class models has compressed faster than almost anyone predicted. For a long list of production tasks — classification, extraction, structured-output generation, routing — a fine-tuned small model now matches or beats a generalist frontier API at one-tenth the latency and one-fiftieth the cost.

The implication for product teams is uncomfortable: the default of calling the biggest available model is no longer the safe choice. It's the expensive choice. The work to identify which sub-tasks belong on which model is becoming a core engineering discipline.

The Dispatch

The Signal in your inbox

Join 42,000+ software leaders for a weekly briefing on the architectural shifts and economic trends shaping the next decade of SaaS.

No spam. One email a week. Unsubscribe at any time.