claude-opus-4-8 Benchmark & Insights

Anthropic Claude API

Updated Jul 18, 2026 All models

Sample size

156 runs

in window

Accuracy

93.8%

consensus match · 161d

Confidence

82%

over 162 runs

Window end

Jul 18, 2026

most recent run

Input price

$5.00/MTok

prompt tokens

Output price

$25.00/MTok

completion tokens

Model insights

01 The strongest Claude, but every one of its 11 misses is an underrating and it said "safe" on three consensus-"unsafe" days (2026-02-18, 02-25, 07-05) — the wrong direction to fail for a safety check.
02 At $5/$25 it's an expensive optimist; haiku-4-5 gets within 2.5 points at a fifth of the price.

Notes

Recent forecasts

Date

Conf.

Risk

Safe