gpt-oss-120b Benchmark & Insights

Openai Cloudflare Workers AI

Updated Jul 18, 2026 All models

Sample size

156 runs

in window

Accuracy

60.6%

consensus match · 160d

Confidence

96%

over 161 runs

Window end

Jul 18, 2026

most recent run

Input price

$0.35/MTok

prompt tokens

Output price

$0.75/MTok

completion tokens

Model insights

01 Declared "unsafe" on about 40 consensus-"safe" days — a quarter of the calendar — while never once underrating (89 over, 0 under).
02 The Feb-Mar stretch shows week-long false-alarm streaks.
03 As a $0.35/$0.75 open model it is cheap, but gemma-4-31b-it is cheaper and 29 points better.

Recent forecasts

Date

Conf.

Risk

Safe