Op
gpt-5.1 Benchmark & Insights
OpenAI OpenAI API
Updated Jun 6, 2026 All models
Sample size
111 runs
in window
Accuracy
85.8%
consensus match · 113d
Confidence
93%
over 113 runs
Window end
Jun 6, 2026
most recent run
Input price
$1.25/MTok
prompt tokens
Output price
$10.00/MTok
completion tokens
Model insights
- 01 The riskiest failure mode on the board: 19 underratings against zero overratings, including 6 days where it said "safe" on a consensus "unsafe" day — mostly in a Feb–Mar cluster.
- 02 For a driving-safety signal this optimism is worse than its score suggests; gpt-5.4 and gpt-5.5 corrected it.