Anthropic's new flagship AI model tops industry benchmarks for coding, reasoning, and long-context tasks — and now features a 1 million token context window in beta.
Anthropic has unveiled Claude Opus 4.6, a significant upgrade to its most powerful model family. The release focuses on deeper agentic capabilities, stronger coding performance, and a major leap in how much context the model can reliably process.
The company says Opus 4.6 plans more carefully, sustains autonomous tasks for longer sessions, and handles larger codebases with greater precision — including improved code review and self-debugging. For the first time in Anthropic's Opus-class lineup, the model supports a 1 million token context window, currently available in beta.
Beyond coding, the model is positioned as an upgrade for everyday professional work: running financial analyses, conducting research, and generating documents, spreadsheets, and presentations.
Benchmark-topping performance
Anthropic claims Opus 4.6 leads all frontier models on several key evaluations. It achieves the top score on Terminal-Bench 2.0 (agentic coding) and Humanity's Last Exam (complex multidisciplinary reasoning). On GDPval-AA, an evaluation of performance on economically valuable knowledge work in finance and legal domains, Opus 4.6 reportedly outperforms OpenAI's GPT-5.2 by around 144 Elo points.
The model also sets a new high on BrowseComp, which measures a model's ability to locate hard-to-find information online.
On long-context retrieval, the gap is striking: Opus 4.6 scores 76% on the 1M-token variant of MRCR v2 — a needle-in-a-haystack benchmark — compared to just 18.5% for Sonnet 4.5.
Safety gains alongside capability
Anthropic states that Opus 4.6's safety profile matches or exceeds any other frontier model in the industry, with low rates of deception, sycophancy, and misuse cooperation on automated behavioral audits. Notably, it also shows the lowest over-refusal rate of any recent Claude model.
Given the model's enhanced cybersecurity capabilities, Anthropic developed six new detection probes to track potential misuse, and says it is accelerating the model's use for defensive cybersecurity — including finding and patching vulnerabilities in open-source software.
New developer and product features
Alongside the model launch, Anthropic announced several platform updates:
- Adaptive thinking — Claude can now decide when to engage deeper reasoning, rather than requiring developers to toggle it manually.
- Effort controls — Four levels (low, medium, high, max) let developers balance intelligence against speed and cost.
- Context compaction (beta) — Automatically summarizes older context to enable longer-running agentic tasks.
- 128k output tokens — Allows Claude to complete large-output tasks in a single request.
- Agent teams in Claude Code — Multiple agents can work in parallel on a codebase, coordinating autonomously.
- Claude in PowerPoint (research preview) — Available to Max, Team, and Enterprise plan users, the new tool reads layouts and slide masters to generate on-brand presentations.
Claude Opus 4.6 is available now on claude.ai, via the API using the model string claude-opus-4-6, and on major cloud platforms. Pricing remains $5/$25 per million input/output tokens, with premium pricing for prompts exceeding 200k tokens.
source: https://www.anthropic.com/news/claude-opus-4-6
Comments
No comments yet.