claude-sonnet-4-6 Benchmark & Insights

Anthropic Claude API

Updated Jul 18, 2026 All models

Sample size

157 runs

in window

Accuracy

80.0%

consensus match · 160d

Confidence

87%

over 162 runs

Window end

Jul 18, 2026

most recent run

Input price

$3.00/MTok

prompt tokens

Output price

$15.00/MTok

completion tokens

Model insights

01 The mirror image of claude-sonnet-5: heavily "pessimistic" (35 over vs 1 under) where its successor is purely "optimistic".
02 It spent Feb-Apr almost ritually calling "medium" on "low" days and raised four false "unsafe" alarms; the sonnet-5 upgrade was worth 7.7 points.

Recent forecasts

Date

Conf.

Risk

Safe