GAUSS: General Assessment of Underlying Structured Skills in Mathematics

An Open Benchmark and Community for Math AI

Authors:

Yue Zhang¹*, Jiaxin Zhang¹²*, Qiuyu Ren³, Tahsin Saffat³, Xiaoxuan Liu³, Zitong Yang⁴, Banghua Zhu⁵⁶, Yi Ma³⁷

*Equal Contribution

Affiliations:

¹ Hyperbolic Labs ²California Institute of Technology ³University of California, Berkeley ⁴Stanford University ⁵Nvidia ⁶University of Washington ⁷University of Hong Kong

Introduction

GAUSS (General Assessment of Underlying Structured Skills in Mathematics) is a benchmark for assessing mathematical skills in large language models. It is designed to systematically break down and evaluate the core cognitive skills that underlie problem solving. Unlike existing datasets, GAUSS goes beyond checking final answers — It evaluates dimensions such as knowledge, conceptual understanding, problem-solving strategies, communications, learning and creativity, providing a comprehensive assessment of model capabilities and limitations.

Our goals:

  1. Skill decomposition: Implement a multidimensional evaluation framework with radar-style analyses to capture models’ strengths and weaknesses across twelve cognitive dimensions.
  1. Saturation mitigation: Extend beyond existing benchmarks (e.g., GSM8K, MATH) by incorporating challenging tasks from Olympiads, graduate-level coursework, and research problems.
  1. Contamination minimization: Curate problem sets explicitly excluded from model training corpora to ensure fair and robust evaluation.

We warmly invite you to join the GAUSS community — contribute problems, propose new skill dimensions, or share feedback. Let’s build the future of math AI evaluation, together!


Learn more in the GAUSS Project Blog.

Contribute through the Problem Submission Portal.

Explore the GAUSS Dataset.

Read the full paper PDF.

(All links are also accessible through the top navigation bar.)


Why Community Building Matters

Mathematics is boundless — and so is benchmarking. GAUSS is envisioned as a collaborative platform where the community drives progress:

  • Researchers, academics and students can contribute new problems, task modules, or theoretical frameworks.
  • Engineers and developers can enhance evaluation pipelines, automation, and visualization tools.

By working together, we aim to make GAUSS not just a benchmark, but an open, community-driven ecosystem that accelerates the advancement of AI in mathematics.


The GAUSS Framework

GAUSS organizes mathematical ability into three domains and twelve skills:

  • Mathematical Knowledge and Understanding
    • Memory of Math Knowledge
    • Understanding of Knowledge and Theories
    • Computational and Analytical Skills
  • Problem Solving and Communication
    • Problem-Solving Framework
    • Logical Thinking and Reasoning
    • Writing and Presentation
  • Learning, Meta Skills, and Creativity
    • Learning New Knowledge
    • Intuition
    • Meta Skills
    • Mathematical Modeling
    • Generalization
    • Creativity

This structure provides a comprehensive breakdown of mathematical cognition, from foundational recall to creative problem posing.


Sample Problem and Evaluation Pipeline

Problem

GPT-5 Thinking Response

Standard Solution

Rubric

Score and Evaluation