An Open Benchmark and Community for Math AI
Authors:
Yue Zhang¹*, Jiaxin Zhang¹²*, Qiuyu Ren³, Tahsin Saffat³, Xiaoxuan Liu³, Zitong Yang⁴, Banghua Zhu⁵⁶, Yi Ma³⁷
*Equal Contribution
Affiliations:
¹ Hyperbolic Labs ²California Institute of Technology ³University of California, Berkeley ⁴Stanford University ⁵Nvidia ⁶University of Washington ⁷University of Hong Kong

Introduction
GAUSS (General Assessment of Underlying Structured Skills in Mathematics) is a benchmark for assessing mathematical skills in large language models. It is designed to systematically break down and evaluate the core cognitive skills that underlie problem solving. Unlike existing datasets, GAUSS goes beyond checking final answers — It evaluates dimensions such as knowledge, conceptual understanding, problem-solving strategies, communications, learning and creativity, providing a comprehensive assessment of model capabilities and limitations.
Our goals:
- Skill decomposition: Implement a multidimensional evaluation framework with radar-style analyses to capture models’ strengths and weaknesses across twelve cognitive dimensions.
- Saturation mitigation: Extend beyond existing benchmarks (e.g., GSM8K, MATH) by incorporating challenging tasks from Olympiads, graduate-level coursework, and research problems.
- Contamination minimization: Curate problem sets explicitly excluded from model training corpora to ensure fair and robust evaluation.
We warmly invite you to join the GAUSS community — contribute problems, propose new skill dimensions, or share feedback. Let’s build the future of math AI evaluation, together!
Why Community Building Matters
Mathematics is boundless — and so is benchmarking. GAUSS is envisioned as a collaborative platform where the community drives progress:
- Researchers, academics and students can contribute new problems, task modules, or theoretical frameworks.
- Engineers and developers can enhance evaluation pipelines, automation, and visualization tools.
By working together, we aim to make GAUSS not just a benchmark, but an open, community-driven ecosystem that accelerates the advancement of AI in mathematics.
The GAUSS Framework
GAUSS organizes mathematical ability into three domains and twelve skills:
- Mathematical Knowledge and Understanding
- Understanding of Knowledge and Theories
- Computational and Analytical Skills
- Problem Solving and Communication
- Problem-Solving Framework
- Logical Thinking and Reasoning
- Learning, Meta Skills, and Creativity
This structure provides a comprehensive breakdown of mathematical cognition, from foundational recall to creative problem posing.
Sample Problem and Evaluation Pipeline