An
claude-opus-4-8 Benchmark & Insights
Anthropic Claude API
Updated Jun 6, 2026 All models
Sample size
9 runs
in window
Accuracy
100.0%
consensus match · 11d
Confidence
87%
over 11 runs
Window end
Jun 6, 2026
most recent run
Input price
$5.00/MTok
prompt tokens
Output price
$25.00/MTok
completion tokens
Model insights
- 01 Perfect agreement, but on only 9 days — undersampled, so treat the score as provisional.
- 02 It rated "low" on 8 of 9 days, meaning it has barely been tested outside calm weather; its siblings opus-4-7 and opus-4-6 settle near 95% on real samples.
Notes
Improved version of Opus 4.7