About
Chess AI Bench is the definitive public benchmark for evaluating AI models on chess understanding and problem solving.
Mission
Chess AI Bench is the definitive public benchmark for evaluating AI models on chess understanding and problem solving. We provide transparent methodology, open data, and reproducible results so developers and researchers can make informed model decisions.
Why This Benchmark Exists
No standard benchmark existed for chess AI comprehension — as opposed to chess gameplay (Stockfish, Leela, etc.). Existing coding and reasoning benchmarks do not capture a model's ability to understand chess concepts, evaluate positions, or solve tactical puzzles from natural-language descriptions.
We built the reference. Chess AI Bench fills that gap with a rigorous, reproducible suite that any team can run and contribute to.
Dataset Inspiration
The Chess Understanding benchmark is inspired by the ChessQA dataset from the University of Toronto (arXiv:2510.23948). The ChessQA dataset was created by researchers at U of T and adapted here for AI model evaluation.
arXiv:2510.23948 — ChessQA (University of Toronto)Contact
For questions, methodology feedback, or to request a model evaluation, reach out via email.
Read Methodology