About

Chess AI Bench is the definitive public benchmark for evaluating AI models on chess understanding and problem solving.

Mission

Chess AI Bench is the definitive public benchmark for evaluating AI models on chess understanding and problem solving. We provide transparent methodology, open data, and reproducible results so developers and researchers can make informed model decisions.

Why This Benchmark Exists

No standard benchmark existed for chess AI comprehension — as opposed to chess gameplay (Stockfish, Leela, etc.). Existing coding and reasoning benchmarks do not capture a model's ability to understand chess concepts, evaluate positions, or solve tactical puzzles from natural-language descriptions.

We built the reference. Chess AI Bench fills that gap with a rigorous, reproducible suite that any team can run and contribute to.

Dataset Inspiration

The Chess Understanding benchmark is inspired by the ChessQA dataset from the University of Toronto (arXiv:2510.23948). The ChessQA dataset was created by researchers at U of T and adapted here for AI model evaluation.

arXiv:2510.23948 — ChessQA (University of Toronto)

Contact

For questions, methodology feedback, or to request a model evaluation, reach out via email.

Read Methodology