Gameplay Leaderboard
How well do LLMs actually play chess? Data aggregated from independent research — we do not run these evaluations ourselves.
Data by dubesor.de — independent evaluation, not affiliated with Chess AI Bench.Cached Feb 25, 2026Source
407 models
| # | Model | ELO | Games | Win Rate | Accuracy | Legal Move % | Avg Turns |
|---|---|---|---|---|---|---|---|
| 1 | gemini-3-pro-preview_Reasoning | 1845 | 40 | 0.0% | — | — | — |
| 2 | gemini-3.1-pro-preview_Reasoning | 1837 | 15 | 0.0% | — | — | — |
| 3 | gpt-4.5-preview ˟_Continuation | 1800 | 20 | 0.0% | — | — | — |
| 4 | qwen3-max-thinking_Reasoning | 1800 | 1 | 0.0% | — | — | — |
| 5 | gemini-3-pro-preview_Continuation | 1795 | 27 | 0.0% | — | — | — |
| 6 | gpt-5.1-codex_Reasoning | 1785 | 16 | 0.0% | — | — | — |
| 7 | gpt-5-codex_Reasoning | 1777 | 23 | 0.0% | — | — | — |
| 8 | gemini-3-flash-preview_Continuation | 1767 | 24 | 0.0% | — | — | — |
| 9 | gemini-3.1-pro-preview_Continuation | 1661 | 12 | 0.0% | — | — | — |
| 10 | grok-4_Reasoning | 1615 | 27 | 0.0% | — | — | — |
| 11 | chatgpt-4o-latest ˟_Continuation | 1584 | 18 | 0.0% | — | — | — |
| 12 | o3_Reasoning | 1558 | 35 | 0.0% | — | — | — |
| 13 | gpt-5_Reasoning | 1526 | 23 | 0.0% | — | — | — |
| 14 | gpt-5.1_Reasoning | 1526 | 19 | 0.0% | — | — | — |
| 15 | gpt-5-chat_Continuation | 1497 | 28 | 0.0% | — | — | — |
| 16 | gpt-4o_Continuation | 1463 | 37 | 0.0% | — | — | — |
| 17 | gpt-5_Continuation | 1443 | 17 | 0.0% | — | — | — |
| 18 | gemini-3-flash-preview_Reasoning | 1436 | 33 | 0.0% | — | — | — |
| 19 | gpt-5.1-codex_Continuation | 1429 | 14 | 0.0% | — | — | — |
| 20 | Human | 1413 | 245 | 0.0% | — | — | — |
| 21 | gpt-5.1-chat_Continuation | 1401 | 17 | 0.0% | — | — | — |
| 22 | gpt-3.5-turbo-instruct_Continuation | 1379 | 88 | 0.0% | — | — | — |
| 23 | gpt-3.5-turbo_Continuation | 1374 | 63 | 0.0% | — | — | — |
| 24 | gpt-5.1-codex-max_Continuation | 1353 | 14 | 0.0% | — | — | — |
| 25 | o3_Continuation | 1347 | 17 | 0.0% | — | — | — |
| 26 | gpt-5.1-codex-max_Reasoning | 1342 | 18 | 0.0% | — | — | — |
| 27 | gpt-5.2-codex_Reasoning | 1327 | 17 | 0.0% | — | — | — |
| 28 | gpt-4o-2024-11-20_Continuation | 1308 | 23 | 0.0% | — | — | — |
| 29 | gpt-5-codex_Continuation | 1291 | 13 | 0.0% | — | — | — |
| 30 | gpt-4.1-mini_Continuation | 1261 | 28 | 0.0% | — | — | — |
| 31 | gpt-4.1_Continuation | 1239 | 37 | 0.0% | — | — | — |
| 32 | step-3.5-flash_Reasoning | 1232 | 21 | 0.0% | — | — | — |
| 33 | gpt-5.1_Continuation | 1231 | 12 | 0.0% | — | — | — |
| 34 | grok-4.1-fast-reasoning_Reasoning | 1207 | 28 | 0.0% | — | — | — |
| 35 | gpt-4_Continuation | 1203 | 15 | 0.0% | — | — | — |
| 36 | gemini-2.0-flash-001_Continuation | 1196 | 23 | 0.0% | — | — | — |
| 37 | gpt-5.2_Reasoning | 1174 | 18 | 0.0% | — | — | — |
| 38 | grok-4-fast-reasoning_Reasoning | 1145 | 30 | 0.0% | — | — | — |
| 39 | gpt-5.2-codex_Continuation | 1145 | 13 | 0.0% | — | — | — |
| 40 | gpt-5-mini_Continuation | 1117 | 16 | 0.0% | — | — | — |
| 41 | gpt-5-nano_Reasoning | 1112 | 30 | 0.0% | — | — | — |
| 42 | codex-mini ˟_Reasoning | 1108 | 35 | 0.0% | — | — | — |
| 43 | gpt-5-mini_Reasoning | 1107 | 30 | 0.0% | — | — | — |
| 44 | gpt-5.3-codex_Reasoning | 1089 | 8 | 0.0% | — | — | — |
| 45 | gpt-5.3-codex_Continuation | 1082 | 7 | 0.0% | — | — | — |
| 46 | claude-opus-4.1_Continuation | 1078 | 20 | 0.0% | — | — | — |
| 47 | deepseek-v3.2-speciale_Reasoning | 1072 | 23 | 0.0% | — | — | — |
| 48 | gemini-2.5-pro_Continuation | 1069 | 25 | 0.0% | — | — | — |
| 49 | gpt-5-nano_Continuation | 1049 | 16 | 0.0% | — | — | — |
| 50 | o4-mini_Reasoning | 1034 | 37 | 0.0% | — | — | — |
| 51 | gpt-4-turbo_Continuation | 1024 | 19 | 0.0% | — | — | — |
| 52 | o4-mini_Continuation | 995 | 12 | 0.0% | — | — | — |
| 53 | gpt-4.5-preview ˟_Reasoning | 992 | 15 | 0.0% | — | — | — |
| 54 | grok-4_Continuation | 992 | 12 | 0.0% | — | — | — |
| 55 | o1_Continuation | 989 | 7 | 0.0% | — | — | — |
| 56 | gpt-4o-mini_Continuation | 955 | 13 | 0.0% | — | — | — |
| 57 | gpt-5.1-codex-mini_Reasoning | 952 | 22 | 0.0% | — | — | — |
| 58 | gemini-2.5-pro_Reasoning | 943 | 46 | 0.0% | — | — | — |
| 59 | grok-code-fast-1_Reasoning | 927 | 23 | 0.0% | — | — | — |
| 60 | kimi-k2.5_Reasoning | 927 | 26 | 0.0% | — | — | — |
| 61 | gpt-oss-120b_Reasoning | 925 | 35 | 0.0% | — | — | — |
| 62 | gpt-5.2_Continuation | 920 | 12 | 0.0% | — | — | — |
| 63 | gpt-oss-20b_Continuation | 906 | 19 | 0.0% | — | — | — |
| 64 | claude-opus-4.5_Continuation | 900 | 17 | 0.0% | — | — | — |
| 65 | gemini-2.5-flash_Continuation | 896 | 22 | 0.0% | — | — | — |
| 66 | grok-4.1-fast-reasoning_Continuation | 894 | 12 | 0.0% | — | — | — |
| 67 | nemotron-3-nano-30b-a3b_Reasoning | 892 | 26 | 0.0% | — | — | — |
| 68 | gemini-1.5-pro ˟_Continuation | 888 | 10 | 0.0% | — | — | — |
| 69 | claude-opus-4_Continuation | 871 | 18 | 0.0% | — | — | — |
| 70 | o1-mini ˟_Continuation | 869 | 8 | 0.0% | — | — | — |
| 71 | claude-opus-4.5_Reasoning | 869 | 28 | 0.0% | — | — | — |
| 72 | gpt-oss-20b_Reasoning | 854 | 31 | 0.0% | — | — | — |
| 73 | codestral-2508_Reasoning | 854 | 1 | 0.0% | — | — | — |
| 74 | gpt-5.1-chat_Reasoning | 853 | 21 | 0.0% | — | — | — |
| 75 | minimax-m2_Continuation | 838 | 12 | 0.0% | — | — | — |
| 76 | codex-mini ˟_Continuation | 833 | 10 | 0.0% | — | — | — |
| 77 | gpt-4o_Reasoning | 829 | 38 | 0.0% | — | — | — |
| 78 | claude-opus-4.6_Continuation | 827 | 15 | 0.0% | — | — | — |
| 79 | gpt-5.2-chat_Reasoning | 825 | 20 | 0.0% | — | — | — |
| 80 | gpt-5.1-codex-mini_Continuation | 824 | 10 | 0.0% | — | — | — |
| 81 | qwen3.5-397b-a17b_Reasoning | 823 | 15 | 0.0% | — | — | — |
| 82 | deepseek-v3.2-speciale_Continuation | 819 | 10 | 0.0% | — | — | — |
| 83 | seed-oss-36b-instruct_Reasoning | 818 | 13 | 0.0% | — | — | — |
| 84 | gpt-4.1_Reasoning | 817 | 40 | 0.0% | — | — | — |
| 85 | qwen3.5-plus-02-15_Reasoning | 812 | 14 | 0.0% | — | — | — |
| 86 | claude-opus-4.6_Reasoning | 811 | 27 | 0.0% | — | — | — |
| 87 | o1_Reasoning | 807 | 11 | 0.0% | — | — | — |
| 88 | seed-1.6_Reasoning | 802 | 21 | 0.0% | — | — | — |
| 89 | chatgpt-4o-latest ˟_Reasoning | 799 | 17 | 0.0% | — | — | — |
| 90 | lfm-7b ˟_Continuation | 796 | 11 | 0.0% | — | — | — |
| 91 | gpt-5.2-chat_Continuation | 793 | 12 | 0.0% | — | — | — |
| 92 | glm-5_Reasoning | 791 | 22 | 0.0% | — | — | — |
| 93 | claude-sonnet-4.6_Reasoning | 790 | 16 | 0.0% | — | — | — |
| 94 | gpt-5-chat_Reasoning | 787 | 26 | 0.0% | — | — | — |
| 95 | qwen3-next-80b-a3b-thinking_Reasoning | 783 | 16 | 0.0% | — | — | — |
| 96 | claude-sonnet-4_Continuation | 780 | 17 | 0.0% | — | — | — |
| 97 | grok-3_Continuation | 780 | 17 | 0.0% | — | — | — |
| 98 | deepseek-v3.2_Reasoning | 780 | 20 | 0.0% | — | — | — |
| 99 | minimax-m2.1_Continuation | 780 | 8 | 0.0% | — | — | — |
| 100 | kimi-k2.5_Continuation | 772 | 12 | 0.0% | — | — | — |
| 101 | grok-4-fast-reasoning_Continuation | 771 | 14 | 0.0% | — | — | — |
| 102 | o1-mini ˟_Reasoning | 766 | 24 | 0.0% | — | — | — |
| 103 | grok-3-mini_Reasoning | 766 | 38 | 0.0% | — | — | — |
| 104 | grok-4-fast-non-reasoning_Reasoning | 756 | 24 | 0.0% | — | — | — |
| 105 | gemini-2.5-flash_Reasoning | 755 | 31 | 0.0% | — | — | — |
| 106 | deepseek-v3.1-terminus_Continuation | 755 | 1 | 0.0% | — | — | — |
| 107 | lfm-2.5-1.2b-instruct_Reasoning | 755 | 1 | 0.0% | — | — | — |
| 108 | grok-2-latest ˟_Continuation | 744 | 8 | 0.0% | — | — | — |
| 109 | command-a_Reasoning | 743 | 18 | 0.0% | — | — | — |
| 110 | qwen3-8b_Reasoning | 741 | 18 | 0.0% | — | — | — |
| 111 | kimi-k2_Reasoning | 737 | 31 | 0.0% | — | — | — |
| 112 | aurora-alpha_Reasoning | 736 | 1 | 0.0% | — | — | — |
| 113 | kimi-k2-0905_Reasoning | 734 | 32 | 0.0% | — | — | — |
| 114 | kimi-k2-thinking_Continuation | 731 | 11 | 0.0% | — | — | — |
| 115 | gemini-2.5-flash-lite_Continuation | 727 | 15 | 0.0% | — | — | — |
| 116 | minimax-m2_Reasoning | 721 | 24 | 0.0% | — | — | — |
| 117 | lfm-2.5-1.2b-thinking_Reasoning | 721 | 1 | 0.0% | — | — | — |
| 118 | kimi-k2-thinking_Reasoning | 715 | 20 | 0.0% | — | — | — |
| 119 | o3-mini_Continuation | 714 | 10 | 0.0% | — | — | — |
| 120 | glm-5_Continuation | 714 | 10 | 0.0% | — | — | — |
| 121 | gpt-4.1-mini_Reasoning | 713 | 27 | 0.0% | — | — | — |
| 122 | claude-opus-4_Reasoning | 709 | 24 | 0.0% | — | — | — |
| 123 | o3-mini_Reasoning | 707 | 16 | 0.0% | — | — | — |
| 124 | qwen3-32b_Reasoning | 706 | 26 | 0.0% | — | — | — |
| 125 | mistral-large-2-2411_Reasoning | 704 | 35 | 0.0% | — | — | — |
| 126 | qwen2.5-72b-instruct_Reasoning | 704 | 38 | 0.0% | — | — | — |
| 127 | kimi-k2_Continuation | 704 | 14 | 0.0% | — | — | — |
| 128 | qwen-plus-2025-07-28_Reasoning | 704 | 17 | 0.0% | — | — | — |
| 129 | longcat-flash-chat_Reasoning | 701 | 22 | 0.0% | — | — | — |
| 130 | qwen3-14b_Reasoning | 700 | 16 | 0.0% | — | — | — |
| 131 | claude-3.7-sonnet_Continuation | 698 | 12 | 0.0% | — | — | — |
| 132 | claude-opus-4.1_Reasoning | 698 | 28 | 0.0% | — | — | — |
| 133 | internvl3-78b_Reasoning | 697 | 11 | 0.0% | — | — | — |
| 134 | deepseek-r1_Reasoning | 691 | 13 | 0.0% | — | — | — |
| 135 | llama-3.3-70b-instruct_Reasoning | 687 | 52 | 0.0% | — | — | — |
| 136 | qwen3-max_Reasoning | 686 | 21 | 0.0% | — | — | — |
| 137 | claude-haiku-4.5_Reasoning | 680 | 21 | 0.0% | — | — | — |
| 138 | glm-4.6v_Reasoning | 680 | 18 | 0.0% | — | — | — |
| 139 | grok-4.1-fast-non-reasoning_Reasoning | 668 | 14 | 0.0% | — | — | — |
| 140 | qwen2.5-max_Reasoning | 667 | 23 | 0.0% | — | — | — |
| 141 | qwen-plus-2025-07-28_Continuation | 667 | 10 | 0.0% | — | — | — |
| 142 | claude-sonnet-4.6_Continuation | 667 | 10 | 0.0% | — | — | — |
| 143 | qwen3-235b-a22b-thinking-2507_Reasoning | 666 | 19 | 0.0% | — | — | — |
| 144 | gemini-2.0-flash-lite-001_Continuation | 665 | 13 | 0.0% | — | — | — |
| 145 | qwen3-coder-next_Reasoning | 664 | 12 | 0.0% | — | — | — |
| 146 | claude-sonnet-4.5_Reasoning | 663 | 30 | 0.0% | — | — | — |
| 147 | deepseek-r1-0528_Reasoning | 662 | 16 | 0.0% | — | — | — |
| 148 | deepseek-v3.2-exp_Reasoning | 660 | 18 | 0.0% | — | — | — |
| 149 | devstral-2512_Reasoning | 660 | 18 | 0.0% | — | — | — |
| 150 | gpt-4o-mini_Reasoning | 659 | 21 | 0.0% | — | — | — |
| 151 | qwen3-coder-plus_Reasoning | 659 | 14 | 0.0% | — | — | — |
| 152 | grok-2-latest ˟_Reasoning | 659 | 16 | 0.0% | — | — | — |
| 153 | seed-1.6-flash_Reasoning | 659 | 21 | 0.0% | — | — | — |
| 154 | gemma-2-27b-it_Reasoning | 658 | 18 | 0.0% | — | — | — |
| 155 | glm-4.5_Reasoning | 658 | 21 | 0.0% | — | — | — |
| 156 | claude-3.7-sonnet_Reasoning | 655 | 26 | 0.0% | — | — | — |
| 157 | claude-3.5-sonnet_Continuation | 653 | 11 | 0.0% | — | — | — |
| 158 | minimax-m1_Reasoning | 653 | 15 | 0.0% | — | — | — |
| 159 | olmo-3-32b-think_Reasoning | 650 | 15 | 0.0% | — | — | — |
| 160 | phi-4_Reasoning | 649 | 14 | 0.0% | — | — | — |
| 161 | qwen2.5-plus_Reasoning | 649 | 15 | 0.0% | — | — | — |
| 162 | intellect-3_Reasoning | 649 | 7 | 0.0% | — | — | — |
| 163 | deepseek-v3-0324_Continuation | 648 | 13 | 0.0% | — | — | — |
| 164 | hunyuan-a13b-instruct_Continuation | 648 | 8 | 0.0% | — | — | — |
| 165 | gpt-4_Reasoning | 648 | 12 | 0.0% | — | — | — |
| 166 | gemini-2.5-flash-lite_Reasoning | 645 | 28 | 0.0% | — | — | — |
| 167 | claude-3-sonnet ˟_Reasoning | 642 | 6 | 0.0% | — | — | — |
| 168 | claude-sonnet-4_Reasoning | 640 | 33 | 0.0% | — | — | — |
| 169 | qwen3-235b-a22b_Reasoning | 639 | 19 | 0.0% | — | — | — |
| 170 | llama-3.1-nemotron-ultra-253b-v1_Reasoning | 637 | 13 | 0.0% | — | — | — |
| 171 | gpt-oss-120b_Continuation | 637 | 13 | 0.0% | — | — | — |
| 172 | claude-sonnet-4.5_Continuation | 634 | 17 | 0.0% | — | — | — |
| 173 | ernie-4.5-21b-a3b-thinking_Reasoning | 634 | 13 | 0.0% | — | — | — |
| 174 | gpt-4-turbo_Reasoning | 632 | 21 | 0.0% | — | — | — |
| 175 | qwen3-235b-a22b_Continuation | 630 | 10 | 0.0% | — | — | — |
| 176 | claude-haiku-4.5_Continuation | 629 | 15 | 0.0% | — | — | — |
| 177 | ministral-14b-2512_Reasoning | 628 | 13 | 0.0% | — | — | — |
| 178 | gemini-1.5-pro ˟_Reasoning | 627 | 12 | 0.0% | — | — | — |
| 179 | gemini-2.0-flash-001_Reasoning | 626 | 47 | 0.0% | — | — | — |
| 180 | deepseek-v3_Continuation | 626 | 10 | 0.0% | — | — | — |
| 181 | qwen2.5-vl-32b-instruct_Reasoning | 626 | 2 | 0.0% | — | — | — |
| 182 | inflection-3-pi_Reasoning | 625 | 11 | 0.0% | — | — | — |
| 183 | mistral-large-3-2512_Reasoning | 621 | 15 | 0.0% | — | — | — |
| 184 | grok-code-fast-1_Continuation | 619 | 10 | 0.0% | — | — | — |
| 185 | llama-3.3-nemotron-super-49b-v1.5_Reasoning | 619 | 11 | 0.0% | — | — | — |
| 186 | deepseek-v3-0324_Reasoning | 613 | 27 | 0.0% | — | — | — |
| 187 | glm-4.6_Reasoning | 613 | 23 | 0.0% | — | — | — |
| 188 | llama-3.3-70b-instruct_Continuation | 611 | 13 | 0.0% | — | — | — |
| 189 | qwen3-30b-a3b_Reasoning | 610 | 17 | 0.0% | — | — | — |
| 190 | deepseek-v3_Reasoning | 609 | 17 | 0.0% | — | — | — |
| 191 | minimax-m2.1_Reasoning | 607 | 12 | 0.0% | — | — | — |
| 192 | qwen3.5-397b-a17b_Continuation | 605 | 7 | 0.0% | — | — | — |
| 193 | devstral-2512_Continuation | 604 | 9 | 0.0% | — | — | — |
| 194 | llama-3.3-nemotron-super-49b-v1_Reasoning | 602 | 10 | 0.0% | — | — | — |
| 195 | grok-3_Reasoning | 602 | 21 | 0.0% | — | — | — |
| 196 | devstral-small-2505_Reasoning | 599 | 3 | 0.0% | — | — | — |
| 197 | minimax-m2.5_Reasoning | 599 | 14 | 0.0% | — | — | — |
| 198 | magistral-medium-2506_Reasoning | 598 | 10 | 0.0% | — | — | — |
| 199 | inflection-3-pi_Continuation | 598 | 1 | 0.0% | — | — | — |
| 200 | gpt-4.1-nano_Continuation | 597 | 8 | 0.0% | — | — | — |
| 201 | gemini-1.5-flash ˟_Reasoning | 594 | 10 | 0.0% | — | — | — |
| 202 | llama-3.1-70b-instruct_Reasoning | 593 | 13 | 0.0% | — | — | — |
| 203 | qwen3-next-80b-a3b-instruct_Reasoning | 593 | 20 | 0.0% | — | — | — |
| 204 | aurora-alpha_Continuation | 593 | 1 | 0.0% | — | — | — |
| 205 | deepseek-v3.1_Reasoning | 591 | 19 | 0.0% | — | — | — |
| 206 | deepseek-v3.2_Continuation | 591 | 11 | 0.0% | — | — | — |
| 207 | llama-3.1-405b-instruct_Reasoning | 590 | 24 | 0.0% | — | — | — |
| 208 | nemotron-3-nano-30b-a3b_Continuation | 589 | 5 | 0.0% | — | — | — |
| 209 | gpt-4o-2024-11-20_Reasoning | 588 | 25 | 0.0% | — | — | — |
| 210 | gemini-2.0-flash-lite-001_Reasoning | 588 | 18 | 0.0% | — | — | — |
| 211 | devstral-medium_Reasoning | 588 | 12 | 0.0% | — | — | — |
| 212 | glm-4.5-air_Reasoning | 585 | 19 | 0.0% | — | — | — |
| 213 | qwen3-vl-235b-a22b-thinking_Reasoning | 585 | 11 | 0.0% | — | — | — |
| 214 | mimo-v2-flash_Reasoning | 585 | 13 | 0.0% | — | — | — |
| 215 | gemma-3-12b-it_Reasoning | 584 | 16 | 0.0% | — | — | — |
| 216 | claude-3.5-haiku_Reasoning | 581 | 22 | 0.0% | — | — | — |
| 217 | qwen3-coder-480b-a35b_Reasoning | 581 | 12 | 0.0% | — | — | — |
| 218 | qwq-32b_Reasoning | 580 | 13 | 0.0% | — | — | — |
| 219 | gemma-2-27b-it_Continuation | 579 | 6 | 0.0% | — | — | — |
| 220 | magistral-medium-2506:thinking_Reasoning | 578 | 2 | 0.0% | — | — | — |
| 221 | hunyuan-a13b-instruct_Reasoning | 576 | 14 | 0.0% | — | — | — |
| 222 | llama-4-maverick_Reasoning | 575 | 26 | 0.0% | — | — | — |
| 223 | ling-1t_Reasoning | 574 | 17 | 0.0% | — | — | — |
| 224 | qwen2.5-turbo_Reasoning | 573 | 13 | 0.0% | — | — | — |
| 225 | jamba-large-1.7_Reasoning | 572 | 14 | 0.0% | — | — | — |
| 226 | gpt-4.1-nano_Reasoning | 570 | 20 | 0.0% | — | — | — |
| 227 | inflection-3-productivity_Reasoning | 570 | 11 | 0.0% | — | — | — |
| 228 | mistral-small-3.2-24b-instruct_Reasoning | 569 | 14 | 0.0% | — | — | — |
| 229 | ernie-4.5-300b-a47b_Reasoning | 569 | 19 | 0.0% | — | — | — |
| 230 | lfm2-8b-a1b_Reasoning | 569 | 19 | 0.0% | — | — | — |
| 231 | qwen3-next-80b-a3b-thinking_Continuation | 568 | 10 | 0.0% | — | — | — |
| 232 | deepseek-v3.1-terminus_Reasoning | 568 | 14 | 0.0% | — | — | — |
| 233 | claude-opus-4.5-thinking_Reasoning | 567 | 1 | 0.0% | — | — | — |
| 234 | ernie-4.5-21b-a3b_Reasoning | 566 | 18 | 0.0% | — | — | — |
| 235 | mistral-medium-3_Reasoning | 565 | 17 | 0.0% | — | — | — |
| 236 | glm-4.7-flash_Reasoning | 562 | 11 | 0.0% | — | — | — |
| 237 | qwen3-30b-a3b-thinking-2507_Reasoning | 561 | 13 | 0.0% | — | — | — |
| 238 | ministral-8b_Reasoning | 560 | 21 | 0.0% | — | — | — |
| 239 | command-r-plus-08-2024_Reasoning | 560 | 13 | 0.0% | — | — | — |
| 240 | internvl3-78b_Continuation | 559 | 2 | 0.0% | — | — | — |
| 241 | qwen3-30b-a3b-instruct-2507_Reasoning | 558 | 22 | 0.0% | — | — | — |
| 242 | gpt-3.5-turbo-instruct_Reasoning | 553 | 13 | 0.0% | — | — | — |
| 243 | mistral-small-24b-instruct-2501_Reasoning | 553 | 15 | 0.0% | — | — | — |
| 244 | nova-2-lite-v1_Reasoning | 553 | 13 | 0.0% | — | — | — |
| 245 | gemini-1.5-flash-8b ˟_Reasoning | 552 | 10 | 0.0% | — | — | — |
| 246 | deepseek-r1_Continuation | 551 | 2 | 0.0% | — | — | — |
| 247 | lfm-7b ˟_Reasoning | 550 | 25 | 0.0% | — | — | — |
| 248 | gpt-3.5-turbo_Reasoning | 550 | 14 | 0.0% | — | — | — |
| 249 | hermes-4-70b_Reasoning | 550 | 2 | 0.0% | — | — | — |
| 250 | step-3.5-flash_Continuation | 548 | 11 | 0.0% | — | — | — |
| 251 | qwen3-next-80b-a3b-instruct_Continuation | 546 | 11 | 0.0% | — | — | — |
| 252 | glm-4-32b_Reasoning | 545 | 17 | 0.0% | — | — | — |
| 253 | kimi-k2-0905_Continuation | 545 | 13 | 0.0% | — | — | — |
| 254 | minimax-m2.5_Continuation | 544 | 1 | 0.0% | — | — | — |
| 255 | claude-3-opus ˟_Reasoning | 543 | 10 | 0.0% | — | — | — |
| 256 | grok-3-mini_Continuation | 542 | 10 | 0.0% | — | — | — |
| 257 | mistral-medium-3.1_Reasoning | 541 | 17 | 0.0% | — | — | — |
| 258 | qwen3-vl-32b-instruct_Reasoning | 541 | 7 | 0.0% | — | — | — |
| 259 | qwen3-max_Continuation | 540 | 10 | 0.0% | — | — | — |
| 260 | grok-4-fast-non-reasoning_Continuation | 540 | 12 | 0.0% | — | — | — |
| 261 | claude-3-haiku_Reasoning | 539 | 13 | 0.0% | — | — | — |
| 262 | qwen3-235b-a22b-instruct-2507_Reasoning | 539 | 27 | 0.0% | — | — | — |
| 263 | command-r-08-2024_Reasoning | 538 | 12 | 0.0% | — | — | — |
| 264 | llama-4-scout_Continuation | 537 | 10 | 0.0% | — | — | — |
| 265 | gemma-3-27b-it_Reasoning | 536 | 19 | 0.0% | — | — | — |
| 266 | llama-4-maverick_Continuation | 535 | 13 | 0.0% | — | — | — |
| 267 | claude-3.5-sonnet_Reasoning | 533 | 13 | 0.0% | — | — | — |
| 268 | longcat-flash-chat_Continuation | 533 | 10 | 0.0% | — | — | — |
| 269 | deepseek-v3.2-exp_Continuation | 532 | 6 | 0.0% | — | — | — |
| 270 | mimo-v2-flash_Continuation | 531 | 10 | 0.0% | — | — | — |
| 271 | mistral-large-2-2411_Continuation | 530 | 10 | 0.0% | — | — | — |
| 272 | qwen3-vl-235b-a22b-instruct_Reasoning | 527 | 12 | 0.0% | — | — | — |
| 273 | gemma-2-9b-it_Reasoning | 526 | 19 | 0.0% | — | — | — |
| 274 | llama-3.1-405b-instruct_Continuation | 520 | 10 | 0.0% | — | — | — |
| 275 | claude-3.7-sonnet:thinking_Reasoning | 520 | 2 | 0.0% | — | — | — |
| 276 | qwen3-vl-8b-instruct_Reasoning | 519 | 7 | 0.0% | — | — | — |
| 277 | llama-4-scout_Reasoning | 517 | 26 | 0.0% | — | — | — |
| 278 | glm-z1-32b_Reasoning | 517 | 2 | 0.0% | — | — | — |
| 279 | gemini-1.5-flash ˟_Continuation | 517 | 3 | 0.0% | — | — | — |
| 280 | mistral-large-3-2512_Continuation | 517 | 11 | 0.0% | — | — | — |
| 281 | wizardlm-2-8x22b_Reasoning | 515 | 12 | 0.0% | — | — | — |
| 282 | seed-1.6-flash_Continuation | 513 | 2 | 0.0% | — | — | — |
| 283 | deepseek-prover-v2_Reasoning | 510 | 8 | 0.0% | — | — | — |
| 284 | mistral-nemo_Reasoning | 510 | 15 | 0.0% | — | — | — |
| 285 | jamba-large-1.6_Reasoning | 507 | 6 | 0.0% | — | — | — |
| 286 | devstral-small_Reasoning | 506 | 12 | 0.0% | — | — | — |
| 287 | magistral-small-2506_Continuation | 505 | 2 | 0.0% | — | — | — |
| 288 | llama-3.1-8b-instruct_Reasoning | 503 | 30 | 0.0% | — | — | — |
| 289 | magistral-small-2506_Reasoning | 503 | 11 | 0.0% | — | — | — |
| 290 | llama-3-8b-instruct_Reasoning | 503 | 14 | 0.0% | — | — | — |
| 291 | qwen3.5-plus-02-15_Continuation | 503 | 8 | 0.0% | — | — | — |
| 292 | molmo-2-8b_Reasoning | 502 | 12 | 0.0% | — | — | — |
| 293 | gemma-3-27b-it_Continuation | 499 | 7 | 0.0% | — | — | — |
| 294 | olmo-3.1-32b-instruct_Reasoning | 499 | 12 | 0.0% | — | — | — |
| 295 | command-r-08-2024_Continuation | 495 | 6 | 0.0% | — | — | — |
| 296 | grok-4.1-fast-non-reasoning_Continuation | 495 | 11 | 0.0% | — | — | — |
| 297 | qwen3-coder-480b-a35b_Continuation | 494 | 9 | 0.0% | — | — | — |
| 298 | qwen3-4b_Reasoning | 493 | 5 | 0.0% | — | — | — |
| 299 | llama-3.2-3b-instruct_Reasoning | 491 | 14 | 0.0% | — | — | — |
| 300 | kimi-linear-48b-a3b-instruct_Reasoning | 489 | 11 | 0.0% | — | — | — |
| 301 | jamba-large-1.7_Continuation | 488 | 10 | 0.0% | — | — | — |
| 302 | ministral-8b-2512_Reasoning | 488 | 16 | 0.0% | — | — | — |
| 303 | devstral-small_Continuation | 487 | 2 | 0.0% | — | — | — |
| 304 | qwen3-vl-30b-a3b-thinking_Reasoning | 487 | 3 | 0.0% | — | — | — |
| 305 | qwen-2.5-7b-instruct_Reasoning | 487 | 15 | 0.0% | — | — | — |
| 306 | deepseek-r1-0528-qwen3-8b_Reasoning | 486 | 3 | 0.0% | — | — | — |
| 307 | glm-4-32b_Continuation | 486 | 8 | 0.0% | — | — | — |
| 308 | deepseek-v3.1_Continuation | 483 | 8 | 0.0% | — | — | — |
| 309 | ernie-4.5-300b-a47b_Continuation | 482 | 8 | 0.0% | — | — | — |
| 310 | jamba-large-1.6_Continuation | 480 | 5 | 0.0% | — | — | — |
| 311 | hermes-4-405b_Reasoning | 479 | 2 | 0.0% | — | — | — |
| 312 | qwen2.5-72b-instruct_Continuation | 476 | 11 | 0.0% | — | — | — |
| 313 | deepseek-prover-v2_Continuation | 474 | 4 | 0.0% | — | — | — |
| 314 | mistral-small-3.1-24b-instruct_Reasoning | 473 | 19 | 0.0% | — | — | — |
| 315 | inflection-3-productivity_Continuation | 471 | 1 | 0.0% | — | — | — |
| 316 | tng-r1t-chimera_Reasoning | 471 | 1 | 0.0% | — | — | — |
| 317 | olmo-3.1-32b-think_Reasoning | 471 | 1 | 0.0% | — | — | — |
| 318 | llama-3.3-8b-instruct_Reasoning | 470 | 15 | 0.0% | — | — | — |
| 319 | rnj-1-instruct_Reasoning | 464 | 12 | 0.0% | — | — | — |
| 320 | seed-1.6_Continuation | 462 | 3 | 0.0% | — | — | — |
| 321 | phi-4_Continuation | 461 | 7 | 0.0% | — | — | — |
| 322 | mistral-small-creative_Reasoning | 461 | 13 | 0.0% | — | — | — |
| 323 | gemma-3-4b-it_Reasoning | 460 | 14 | 0.0% | — | — | — |
| 324 | mythomax-l2-13b_Reasoning | 460 | 11 | 0.0% | — | — | — |
| 325 | granite-4.0-h-micro_Reasoning | 459 | 11 | 0.0% | — | — | — |
| 326 | minimax-m1_Continuation | 456 | 1 | 0.0% | — | — | — |
| 327 | qwen2.5-vl-72b-instruct_Reasoning | 454 | 1 | 0.0% | — | — | — |
| 328 | deepseek-r1t-chimera_Reasoning | 453 | 1 | 0.0% | — | — | — |
| 329 | olmo-2-0325-32b-instruct_Reasoning | 453 | 13 | 0.0% | — | — | — |
| 330 | olmo-3-7b-think_Reasoning | 453 | 11 | 0.0% | — | — | — |
| 331 | glm-4.6_Continuation | 446 | 9 | 0.0% | — | — | — |
| 332 | glm-4.5-air_Continuation | 443 | 8 | 0.0% | — | — | — |
| 333 | lfm2-8b-a1b_Continuation | 443 | 1 | 0.0% | — | — | — |
| 334 | afm-4.5b_Reasoning | 442 | 14 | 0.0% | — | — | — |
| 335 | ui-tars-1.5-7b_Continuation | 442 | 1 | 0.0% | — | — | — |
| 336 | minimax-m2-her_Reasoning | 441 | 1 | 0.0% | — | — | — |
| 337 | qwen3-coder-plus_Continuation | 440 | 5 | 0.0% | — | — | — |
| 338 | ministral-3b_Reasoning | 439 | 17 | 0.0% | — | — | — |
| 339 | llama-3.1-nemotron-ultra-253b-v1_Continuation | 438 | 2 | 0.0% | — | — | — |
| 340 | seed-oss-36b-instruct_Continuation | 438 | 3 | 0.0% | — | — | — |
| 341 | qwen2.5-plus_Continuation | 437 | 9 | 0.0% | — | — | — |
| 342 | deepseek-r1-0528_Continuation | 437 | 2 | 0.0% | — | — | — |
| 343 | qwen2.5-turbo_Continuation | 437 | 10 | 0.0% | — | — | — |
| 344 | gemma-3-12b-it_Continuation | 435 | 2 | 0.0% | — | — | — |
| 345 | trinity-large-preview_Reasoning | 434 | 1 | 0.0% | — | — | — |
| 346 | ministral-3b-2512_Reasoning | 433 | 11 | 0.0% | — | — | — |
| 347 | llama-3.1-nemotron-70b-instruct_Reasoning | 432 | 2 | 0.0% | — | — | — |
| 348 | command-r7b-12-2024_Reasoning | 431 | 17 | 0.0% | — | — | — |
| 349 | glm-4.7-flash_Continuation | 431 | 1 | 0.0% | — | — | — |
| 350 | jamba-mini-1.6_Reasoning | 430 | 7 | 0.0% | — | — | — |
| 351 | mistral-medium-3_Continuation | 428 | 9 | 0.0% | — | — | — |
| 352 | jamba-mini-1.7_Continuation | 428 | 5 | 0.0% | — | — | — |
| 353 | llama-3.3-nemotron-super-49b-v1.5_Continuation | 428 | 1 | 0.0% | — | — | — |
| 354 | phi-3-medium-128k-instruct_Reasoning | 425 | 12 | 0.0% | — | — | — |
| 355 | lfm-3b ˟_Continuation | 424 | 1 | 0.0% | — | — | — |
| 356 | glm-4.5_Continuation | 423 | 9 | 0.0% | — | — | — |
| 357 | mistral-7b-instruct-v0.1_Reasoning | 422 | 12 | 0.0% | — | — | — |
| 358 | olmo-3-7b-instruct_Reasoning | 422 | 17 | 0.0% | — | — | — |
| 359 | qwen3-coder-next_Continuation | 421 | 5 | 0.0% | — | — | — |
| 360 | lfm-2.2-6b_Reasoning | 417 | 12 | 0.0% | — | — | — |
| 361 | jamba-mini-1.7_Reasoning | 416 | 15 | 0.0% | — | — | — |
| 362 | gemma-3n-e4b-it_Reasoning | 415 | 16 | 0.0% | — | — | — |
| 363 | qwen2.5-max_Continuation | 414 | 7 | 0.0% | — | — | — |
| 364 | qwen3-vl-235b-a22b-thinking_Continuation | 414 | 1 | 0.0% | — | — | — |
| 365 | gemini-1.5-flash-8b ˟_Continuation | 413 | 1 | 0.0% | — | — | — |
| 366 | lfm-3b ˟_Reasoning | 412 | 14 | 0.0% | — | — | — |
| 367 | qwen3-30b-a3b-instruct-2507_Continuation | 412 | 6 | 0.0% | — | — | — |
| 368 | glm-4.6v_Continuation | 411 | 10 | 0.0% | — | — | — |
| 369 | gemma-2-9b-it_Continuation | 410 | 1 | 0.0% | — | — | — |
| 370 | qwen3-30b-a3b-thinking-2507_Continuation | 408 | 1 | 0.0% | — | — | — |
| 371 | claude-3-opus ˟_Continuation | 407 | 2 | 0.0% | — | — | — |
| 372 | mistral-medium-3.1_Continuation | 406 | 8 | 0.0% | — | — | — |
| 373 | ui-tars-1.5-7b_Reasoning | 405 | 19 | 0.0% | — | — | — |
| 374 | claude-3.5-haiku_Continuation | 395 | 2 | 0.0% | — | — | — |
| 375 | qwen3-235b-a22b-thinking-2507_Continuation | 392 | 2 | 0.0% | — | — | — |
| 376 | wizardlm-2-8x22b_Continuation | 391 | 2 | 0.0% | — | — | — |
| 377 | ernie-4.5-21b-a3b_Continuation | 391 | 3 | 0.0% | — | — | — |
| 378 | qwq-32b_Continuation | 385 | 1 | 0.0% | — | — | — |
| 379 | claude-3-haiku_Continuation | 384 | 5 | 0.0% | — | — | — |
| 380 | gemma-3n-e4b-it_Continuation | 384 | 2 | 0.0% | — | — | — |
| 381 | kimi-linear-48b-a3b-instruct_Continuation | 384 | 5 | 0.0% | — | — | — |
| 382 | olmo-3-32b-think_Continuation | 384 | 1 | 0.0% | — | — | — |
| 383 | olmo-2-0325-32b-instruct_Continuation | 384 | 2 | 0.0% | — | — | — |
| 384 | qwen3-30b-a3b_Continuation | 379 | 2 | 0.0% | — | — | — |
| 385 | command-a_Continuation | 378 | 6 | 0.0% | — | — | — |
| 386 | command-r-plus-08-2024_Continuation | 378 | 5 | 0.0% | — | — | — |
| 387 | qwen3-vl-32b-instruct_Continuation | 378 | 1 | 0.0% | — | — | — |
| 388 | mistral-nemo_Continuation | 375 | 2 | 0.0% | — | — | — |
| 389 | mistral-small-3.2-24b-instruct_Continuation | 374 | 3 | 0.0% | — | — | — |
| 390 | ministral-14b-2512_Continuation | 372 | 1 | 0.0% | — | — | — |
| 391 | ministral-8b-2512_Continuation | 370 | 1 | 0.0% | — | — | — |
| 392 | deepseek-r1-distill-llama-8b_Reasoning | 367 | 2 | 0.0% | — | — | — |
| 393 | qwen3-235b-a22b-instruct-2507_Continuation | 366 | 11 | 0.0% | — | — | — |
| 394 | llama-3.1-nemotron-70b-instruct_Continuation | 366 | 1 | 0.0% | — | — | — |
| 395 | olmo-3-7b-instruct_Continuation | 365 | 1 | 0.0% | — | — | — |
| 396 | llama-3.1-8b-instruct_Continuation | 364 | 2 | 0.0% | — | — | — |
| 397 | olmo-3-7b-think_Continuation | 363 | 1 | 0.0% | — | — | — |
| 398 | mistral-small-3.1-24b-instruct_Continuation | 361 | 2 | 0.0% | — | — | — |
| 399 | claude-3-sonnet ˟_Continuation | 361 | 1 | 0.0% | — | — | — |
| 400 | olmo-3.1-32b-instruct_Continuation | 356 | 2 | 0.0% | — | — | — |
| 401 | qwen3-32b_Continuation | 351 | 6 | 0.0% | — | — | — |
| 402 | llama-3.1-70b-instruct_Continuation | 339 | 1 | 0.0% | — | — | — |
| 403 | qwen3-vl-235b-a22b-instruct_Continuation | 339 | 1 | 0.0% | — | — | — |
| 404 | deepseek-r1-distill-qwen-7b_Reasoning | 333 | 2 | 0.0% | — | — | — |
| 405 | mistral-small-24b-instruct-2501_Continuation | 329 | 2 | 0.0% | — | — | — |
| 406 | llama-3.3-nemotron-super-49b-v1_Continuation | 324 | 1 | 0.0% | — | — | — |
| 407 | qwen2.5-vl-32b-instruct_Continuation | 316 | 1 | 0.0% | — | — | — |