Claude's Million-Token Context Finally Makes Sense (And Costs Less)
Anthropic quietly shipped one of the most important pricing changes since Claude launched. The million-token context window for Claude Opus 4.6 and Sonnet 4.6 is now generally available with standard per-token pricing. No more premium rates once you cross the threshold.
I've been testing this since Friday's announcement. The difference is immediate and dramatic.
What changed and why it matters
Before this update, feeding Claude an entire codebase meant watching costs spike once you hit the long-context premium tier. Now you pay standard rates all the way to a million tokens.
The price drop on Opus alone is worth understanding. Opus 4.1 was $15 per million input tokens and $75 per million output tokens. Opus 4.6 is $5 input and $25 output. That is a two-thirds reduction on the flagship model, not a minor optimization. Sonnet 4.6 held steady at $3 input and $15 output, matching Sonnet 4.5, but the long-context premium is gone for both.
I ran a test with an 800,000-token React application. Under the old pricing, this would have triggered premium rates for roughly 300,000 tokens. Now it is all standard pricing. The cost difference for this single analysis dropped by about 60%.
This is not just about saving money. It removes the mental overhead of context management. You stop worrying about token optimization and start thinking about better problems to solve.
Claude Opus 4.6 and Sonnet 4.6 pricing per million tokens (2026)
These are the current standard rates, straight from Anthropic's pricing page. Both models include the full 1M token context window at these rates with no premium tier.
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Context window | | --- | --- | --- | --- | | Claude Opus 4.6 | $5.00 | $25.00 | 1M tokens (standard rate) | | Claude Sonnet 4.6 | $3.00 | $15.00 | 1M tokens (standard rate) | | Claude Haiku 4.5 | $1.00 | $5.00 | 200K tokens |
A few practical notes on how to read that table:
- Batch API requests get a flat 50% discount on both input and output. If you can tolerate async processing, Opus 4.6 batch pricing is $2.50 input and $12.50 output.
- Prompt caching cache hits are 10% of the input rate. Heavy reuse of a system prompt or document context is where the real savings compound.
- A full 1M-token input request on Sonnet 4.6 costs $3 for input alone. A typical response of 10K output tokens adds $0.15, for about $3.15 total. On Opus 4.6, the same request runs roughly $5.25.
What it costs in practice
The table is one thing. What matters is what you actually pay when you use it.
Most developers are not sending a million tokens per request. A typical workflow, feeding Claude a 50-file TypeScript project with a refactoring prompt, lands around 200K-400K tokens. At Sonnet 4.6 rates, that is $0.60 to $1.20 for input plus whatever output Claude generates. A thorough response with code changes might run 5K-10K output tokens, adding $0.08 to $0.15. Total cost for a serious codebase analysis: under $1.50.
Compare that to the old Opus 4.1 rates for the same request. At $15 per million input tokens, 400K tokens would cost $6 for input alone. The same analysis that now costs $1.50 on Sonnet would have cost over $10 on Opus 4.1. And if you crossed into long-context premium territory, the numbers got worse.
The point is not that AI is cheap. It is that the cost curve crossed the threshold where you stop rationing context and start using all of it.
When to use which model
Haiku 4.5 is for fast, cheap tasks where you do not need deep reasoning. Classification, simple extraction, formatting. At $1 per million input tokens, you can process tens of thousands of requests before the bill matters.
Sonnet 4.6 is the default for most work. Code generation, document analysis, long-context reasoning. The 1M token window at $3 per million input makes it the practical choice for full-codebase analysis. Unless you are hitting a reasoning ceiling, Sonnet is where most developers should live.
Opus 4.6 is for hard problems. Complex multi-step reasoning, ambiguous requirements, novel architecture decisions. The 5x price jump over Sonnet is worth it when a cheaper model would give you a plausible-looking wrong answer. I reach for Opus when I need to trust the output without double-checking every line.
The real impact: whole-codebase analysis
Most developers have been chunking their code, feeding Claude pieces of applications instead of the full context. This created blind spots. Claude could not see dependencies across modules or understand architectural patterns spanning multiple files.
With affordable million-token context, you can drop entire repositories into Claude. I tested this with a 150-file TypeScript project. Claude identified three performance bottlenecks that only became visible when it could see the full call graph across all modules.
The quality difference is substantial. Claude with full context writes better code because it understands how everything connects.
What to do about it
Start feeding Claude larger contexts. Take that refactoring project you have been postponing because explaining the codebase was too expensive. Upload the entire thing and ask Claude to map dependencies, identify technical debt, or suggest architectural improvements.
The cost barrier just disappeared. Use it.
If this changes how you are thinking about AI-assisted development, I want to hear about it. Contact me with your million-token experiments.