June 2026
AI Coder Bench
Real-world coding benchmark rankings. 50 tasks across bug fixing, feature building, refactoring, system design, and debug & explain.
| # | Model | Score |
|---|---|---|
1↑6 | Gemini 2.5 ProGoogle | 87.7 |
2↑4 | Grok 4.3xAI | 86.0 |
3↑1 | DeepSeek-V4-ProDeepSeek | 84.0 |
4↑8 | DeepSeek-V4-FlashDeepSeek | 83.5 |
5↑3 | Claude Opus 4.8Anthropic | 81.4 |
6↓4 | GPT-5.4OpenAI | 77.7 |
7↓2 | Claude Sonnet 4.6Anthropic | 75.6 |
8↑6 | GPT-5.5OpenAI | 74.7 |
9↓8 | Qwen3.7-MaxQwen | 74.3 |
10↓1 | Gemini 2.5 FlashGoogle | 72.1 |
11↑4 | Mistral Large 3Mistral AI | 62.2 |
12↓9 | GPT-5.4 miniOpenAI | 61.3 |
13= | Kimi K2.6Moonshot AI | 59.1 |
14↑2 | Command R+ 08-2024Cohere | 56.7 |
15↓5 | Claude Haiku 4.5Anthropic | 56.4 |
16↓5 | Qwen3.6-PlusQwen | 32.1 |
17↓4 | CodestralMistral AI | 25.0 |
18= | Command R 03-2024Cohere | 12.0 |
S ≥ 90A ≥ 80B ≥ 70C < 70
Scores aggregated from public benchmarks · updated weekly
How scores are calculated
35%Aider Polyglot
225 real coding tasks across 6 languages
35%SWE-bench Verified
500 real GitHub issues, % resolved
20%LiveCodeBench
Ongoing competitive programming, Pass@1
10%EvalPlus (HumanEval+)
Stricter HumanEval test suite