All thoughts and musings
AI-NativeJun 9, 2026 · 7 min read

Fable 5: The Price Went Up and the Knobs Came Off

Anthropic just shipped a model tier above Opus. The price doubled and the dials disappeared — and both of those facts tell you how to run an AI-native organization.

Fable 5A new tier above Opus

Anthropic just put a new rung on top of the ladder. It's called Claude Fable 5, it sits above Opus, and it costs twice as much: ten dollars per million input tokens, fifty per million output, against Opus 4.8's five and twenty-five.

Most of the coverage will be about capability, and fair enough — it's billed as the most intelligent model they've shipped. But I think the two most instructive things about Fable 5 have nothing to do with benchmarks. The first is the price. The second is what they removed from the API. Both are signals about how this technology wants to be managed, and most organizations are reading neither.

The price is the announcement

For the last few years, the implicit deal was that frontier intelligence got cheaper. Every release either raised the ceiling at the same price or held the ceiling and dropped the floor. You could budget AI the way you budget bandwidth: assume the unit cost only goes down, and don't think too hard about it.

Fable 5 breaks that pattern. For the first time in a while, the ladder grew upward instead of the floor dropping. The lineup now runs Haiku at a dollar in, Sonnet at three, Opus at five, and Fable at ten — a real spread, with a meaningful price jump at the top.

2×
the per-token price of Opus 4.8
1Mtokens
of context window
128Ktokens
of maximum output

That spread changes the nature of the decision. When the top model costs roughly the same as the one below it, "just use the best one" is a defensible default. When the top model costs double, model choice becomes a portfolio decision, and a portfolio decision is a management decision. Somebody in your organization now has to be able to answer: which of our problems are actually worth frontier pricing?

When the best model costs double, "just use the best one" stops being a default and becomes a decision.

I've written before about the failure mode where token consumption becomes a status symbol — tokenmaxxing. Fable 5 raises the cost of that vanity. The right question was never "are we using the most powerful model," it was "what does it cost us to be wrong here?" A migration that corrupts data, a security review that misses the hole, an overnight agent run that has to be redone by a human in the morning — those are worth ten-dollar tokens. Reformatting a support ticket is not.

They took away the knobs

Here's the part that fascinates me more than the price. Fable 5 doesn't have a temperature setting. No top-p, no top-k, no fixed thinking budget measured in tokens. Those parameters don't exist on this model — send them and the API rejects the request. You can't even explicitly switch its reasoning off; the model decides for itself when a problem deserves thought and how much.

What you get instead is a much smaller set of controls, and look at what they are. An effort level: how hard should this attempt try, from quick-and-cheap up to spare-no-expense. A task budget: here's roughly how many tokens this whole job is worth, and the model watches its own countdown and prioritizes accordingly. That's the entire interface.

If that vocabulary sounds familiar, it should. It's how you delegate to a senior person. You don't regulate a principal engineer's brain chemistry, and you don't hand them a step-by-step script. You tell them what the outcome is, how much it matters, and how much of their time it's worth — and then you let them work. The API has quietly converged on the language of management: outcome, effort, appetite.

You don't set its temperature anymore. You set its appetite.

I find this genuinely clarifying, because it settles an argument I keep having. There's a persistent instinct in engineering organizations to treat the model as a component to be tuned — find the magic parameters, wrap it in enough scaffolding, micromanage every step. Each generation of these models punishes that instinct a little more. The guidance that comes with this tier is the same advice I give about delegating to people: give the full specification up front, state what done looks like, set the budget, and get out of the way. Vague asks dribbled out over many small corrections burn tokens and produce worse work. Sound familiar?

Route work like a manager, not a fan

So what do you actually do with a four-tier lineup? You staff it the way you staff a team. You don't put your most senior engineer on every ticket — not because they couldn't do the work, but because it's a waste of the scarcest thing you have. The same logic now applies, with a price list attached.

  • Haiku-class work: high-volume, mechanical, cheap to verify. Classification, extraction, routing. Being occasionally wrong is recoverable and obvious.
  • Sonnet-class work: the everyday middle. Most coding tasks, most drafting, most internal tooling. Fast feedback loops catch the mistakes.
  • Opus-class work: hard problems with real stakes. Long agentic runs, serious refactors, analysis your team will act on.
  • Fable-class work: the small set of problems where the cost of being wrong dwarfs the cost of the tokens, and where nobody is watching closely enough to catch a subtle miss.

Notice that the routing criterion is never "how impressive is the task." It's the cost of error and the cost of verification. Work that's cheap to check can run on cheap intelligence, because your checks are the safety net. Work that's expensive to check — overnight runs, subtle judgment calls, anything reviewed by a tired human at 9 a.m. — is exactly where paying double for fewer mistakes is the bargain of the year. I learned this lesson the hard way running fleets of coding agents: the bottleneck was never the agents, it was my capacity to verify what they produced.

What this means for your organization

First, AI spend is graduating from a rounding error to a line item with structure, and that's healthy. A budget with tiers forces the conversation that a flat budget lets you skip: which problems are worth what. If your organization can't answer that question for tokens, I'd gently suggest it couldn't answer it for engineering hours either, and the tokens are just making an old problem visible.

Second, the skill that compounds is the one I keep coming back to: shaping. A model that takes a full, well-specified goal and runs with it for hours rewards exactly the discipline most organizations lack — deciding what the outcome is, what it's worth, and what's explicitly out of scope before the work starts. The organizations getting the most out of these models aren't the ones with the cleverest prompts. They're the ones whose leaders can shape a problem crisply enough to hand it to anyone — human or model — and bet on the result.

And third, stop waiting for this to settle. The ladder will keep growing rungs, top and bottom, and the prices will keep forking. The durable capability isn't familiarity with any one model. It's an operating model that can route work by value, verify cheaply, and treat intelligence — artificial or otherwise — as a portfolio to be managed rather than a status symbol to be maxed.

That operating model is the actual work of an AI-native transformation, and it's the work I do. If your AI bill is growing faster than your confidence in what it's buying, let's talk →

Keep reading
AI-Native · Jun 4, 2026

Tokenmaxxing Is the New Lines of Code

AI-Native · Jun 3, 2026

When Building Gets Cheap, Shaping Becomes the Job

AI-Native · Jun 9, 2026

Ten Coding Agents, One Laptop