On May 28, Anthropic shipped Claude Opus 4.8 and immediately took the top spot on the Artificial Analysis Intelligence Index — 61.4 versus GPT-5.5's 60.2. Two weeks later, people are still arguing about which one is actually better.
They're both right and both missing the point.
These aren't the same model doing the same things. The aggregate scores look close, but once you break them down by task category the gap gets wide fast. Which one you should use depends almost entirely on what you're actually doing with it.
The benchmarks everyone's citing
The headline number, 61.4 vs 60.2 on the Artificial Analysis Intelligence Index, is real but it's an aggregate. It squashes a lot of variance into a single score, which is why you see people reach opposite conclusions from the same data.
The task-specific numbers are more useful.
On SWE-bench, which tests multi-file software engineering, Opus 4.8 scores 69.2% versus GPT-5.5's 58.6%. A 10.6-point gap — the largest between these two models on any single benchmark. If you're regularly refactoring across multiple files, hunting bugs through a codebase, or working under tight implementation constraints, this difference is noticeable in practice. It's not marginal.
Terminal-Bench 2.0 flips it. GPT-5.5 scores 78.2% versus Opus 4.8's 74.6%. This test focuses on complex shell workflows: multi-step commands, iteration, tool coordination. For DevOps-heavy work or anyone who lives in the terminal, GPT-5.5 has the edge.
On GDPval-AA (knowledge work and reasoning), Opus 4.8 leads by about 121 ELO, roughly a 66.7% pairwise win rate. For research, analysis, long documents, legal and financial reasoning — this matters.
Creative writing is the one place where the conventional wisdom has quietly flipped. Claude has long had a reputation for more natural prose, and that reputation was mostly earned. But GPT-5.5 has narrowed the gap, and by some recent comparisons it's slightly ahead on raw fluency. Not dramatically, but enough that people who write with AI all day are noticing.
What they're each built for
Opus 4.8 was built around deep reasoning and complex software work. It handles multi-step logic well, keeps coherence over long contexts (200K token window, versus GPT-5.5's 128K), and does better on tasks that require understanding how parts of a system relate to each other.
The catch: it's verbose. Artificial Analysis found Opus 4.8 takes roughly 30% more turns than GPT-5.5 to finish agentic tasks. More explanation is sometimes what you want. In automated workflows, it's mostly cost.
GPT-5.5 was OpenAI's April 2026 release, built specifically for agentic work — long-horizon tasks, tool use, multi-step execution. The big fix over GPT-5.4 was instruction drift, and it's genuinely better here. It runs leaner, finishes tasks in fewer turns, and keeps things moving across complex workflows without going off-track.
What it isn't: a more creative model. Several reviewers noted that GPT-5.5 is either exactly right for structured agentic workflows or largely beside the point, with not much middle ground. If you're not running those kinds of tasks, the upgrade over earlier versions won't feel obvious.
Pricing
Both models charge the same on input: $5.00 per million tokens.
Output is where they diverge — Opus 4.8 at $25.00 per million tokens, GPT-5.5 at $30.00. Opus looks cheaper per token on paper. Given the 30% verbosity difference, the real-world gap is smaller than it appears, and in some workflows it's essentially even.
For most people this is academic anyway: both models are accessible through their standard $20/month subscriptions. Claude Pro includes Opus 4.8, ChatGPT Plus includes GPT-5.5.
The actual recommendation
If you work on complex software projects — the kind where bugs span multiple files and refactors touch a dozen things at once — Opus 4.8 is the better choice right now and it's not close. The SWE-bench gap is real.
If you're heavy on terminal work, shell scripting, or building automated pipelines that need to run without babysitting, GPT-5.5 is better. It also wins if you care about the broader OpenAI ecosystem: Sora, DALL-E, custom GPTs are all in the Plus subscription.
For writing tasks, it's genuinely close. GPT-5.5 has a slight fluency edge, but both models are good enough that your prompt quality matters more than which model you pick.
For everything else — research, reasoning, long documents, knowledge work — Opus 4.8.
One more thing
Claude vs GPT gets framed like you have to pick a side. You don't. The people who get the most out of both models aren't loyal to either — they run the same prompt through both and use whichever answer is better. Sometimes it's obvious which one won. Sometimes you're surprised.
The benchmark wars are interesting. The more practical question is whether you're stuck committing to a single model at all. In 2026, that's increasingly not the constraint it used to be.