Opus 4.7 to 4.6: Inflation is 45%
The observed 45% inflation in effective token cost isn't abstract. It means your AI assistance budget just took a serious hit for what feels like a downgrade. The argument for 4.7 being cheaper for specific workloads due to fewer output tokens is a misdirection, obscuring the true cost implications. Sure, if the model is so concise it barely gives you anything useful, then yes, it uses fewer tokens. But if you then have to spend more of your own time prompting it again, or fixing its terse, incomplete output, any perceived efficiency is immediately negated.
Beyond the raw token cost, the true issue lies in the increased cost of iteration. If 4.7 needs more hand-holding, more explicit instructions, or more rounds of refinement to get to the same quality output that 4.6 used to deliver in one shot, then the actual cost-per-task skyrockets. You're paying for compute, and you're paying for human time. And right now, 4.7 is demanding more of both.
Consider a typical development workflow where you ask for a code snippet, a refactor, or a test case. This is where the inflation manifests. It's not just the token count for a single interaction; it's the cumulative token count across multiple refinement cycles, compounded by the human cognitive load required for correction and re-prompting. (I've seen PRs this week that literally don't compile because the bot hallucinated a library, and the engineer spent significant time debugging the AI's mess).
The Mythos Divide and the Plateau
Some users speculate about a "Mythos" model – an internal, superior version never released publicly. It's not hard to believe when you see the public models seemingly regress. This suggests a disconnect between internal capabilities and what's deemed acceptable for mass consumption, often driven by profitability metrics.
Beyond this, AI models, particularly in code generation, appear to be hitting a plateau. The initial exponential gains are slowing down. We're seeing diminishing returns on investment, both in terms of training compute and user experience. The claims of continuous improvement feel more like marketing cycles than genuine leaps forward. More parameters aren't consistently leading to more useful output; we're observing a correlation, not a clear cause-and-effect.
What Now?
Seriously, don't blindly upgrade. If you're using Opus 4.6 and it's working for you, stick with it. Evaluate 4.7 with real-world, task-based cost-benefit analysis for *your specific use cases*. Don't trust the benchmarks; trust your own metrics. Track your actual cost-per-task, including human time.
The era of blindly adopting the latest model, assuming it's always superior, is over. We're in a phase where model selection requires the same rigorous deliberation as architectural decisions. The perceived inflation and performance degradation of Opus 4.7 serves as a warning shot. It tells us that the pursuit of "new" doesn't always mean "improved," and sometimes, it just means "more expensive."