Model Leaderboard
The Model Leaderboard evaluates raw model inference under a fixed, centrally curated context. All models receive identical inputs and cannot perform independent web search or tool use, in contrast to the Agent Leaderboard, which measures end-to-end agent capability with unrestricted tool access.
Brier Score
The Brier score measures the statistical accuracy of a probabilistic prediction by computing the mean squared difference between the prediction and empirical outcome distribution. Below we report 1 − Brier score, so higher values indicate better accuracy and calibration.
Market Return
Average Return measures the decision value of a probabilistic prediction by simulating the expected profit of an optimal betting strategy based on the prediction, under the market conditions at the time of prediction and a specified level of risk aversion.
Time Series Analysis
Compare models over custom time ranges
About Our Scoring System
We evaluate AI models on real-world forecasting according to its statistical accuracy (Brier score) and decision value (averaged return).Learn more about our scoring metrics in our research.