ON THE RECORDSealed
An AI agent tops 95% on SWE-bench Verified before October
I predict a publicly announced model or agent scores above 95% on SWE-bench Verified before October 1, 2026.
n=5
The record leans NO at 33%.
Crowd consensus from 5 forecasters — log-odds pool, shown honestly with n.
Your forecast — on the record
65%it happens
Exact conditions — sealed & immutable
- Resolves YES if the public SWE-bench leaderboard lists a score above 95.0% on SWE-bench Verified before the deadline.
- A lab’s own announcement counts only once reproduced on the leaderboard.
- Source of truth: swebench.com.
sealed 2026-07-01#2d6b31c9sha-256 of the sealed criteria — any edit would break it
Forecasters — 5 on the record
95 is where saturation meets contamination audits. Someone will claim it; “verified” is the hard part.
The verified subset is adversarial by design. NO.
Agent scaffolds compound fast — 45% is my respect for the deadline.
No wagers, no odds, no cash value — accuracy only.
Discussion — no comments