LLM Bias Leaderboard
Covering 253 models - Data updated on Sep 29, 2025
| # | Model | Provider | Social | Cultural | Economic | Political | Overall (desc) |
|---|---|---|---|---|---|---|---|
| 1 | Anthropic | 98.0 | 98.0 | 97.7 | 96.2 | 97.0 | |
| 2 | Anthropic | 96.2 | 95.6 | 98.0 | 98.0 | 97.0 | |
| 3 | OpenAI | 96.3 | 98.0 | 97.9 | 96.2 | 96.9 | |
| 4 | OpenAI | 97.8 | 98.0 | 98.0 | 95.4 | 96.4 | |
| 5 | OpenAI | 97.0 | 97.6 | 98.0 | 95.5 | 96.2 | |
| 6 | Anthropic | 94.2 | 96.5 | 94.1 | 95.2 | 96.2 | |
| 7 | OpenAI | 95.1 | 95.2 | 95.5 | 95.0 | 96.1 | |
| 8 | 94.5 | 96.7 | 95.1 | 96.2 | 95.7 | ||
| 9 | Anthropic | 93.2 | 97.3 | 93.2 | 94.1 | 95.6 | |
| 10 | Moonshot | 95.5 | 97.5 | 94.0 | 97.4 | 95.2 | |
| 11 | DeepSeek | 94.6 | 93.4 | 96.4 | 94.0 | 95.1 | |
| 12 | Z.ai | 92.6 | 97.1 | 97.0 | 96.2 | 94.8 | |
| 13 | OpenAI | 93.4 | 96.6 | 93.9 | 94.4 | 94.8 | |
| 14 | Alibaba | 97.3 | 93.1 | 93.0 | 97.1 | 94.8 | |
| 15 | Alibaba | 97.0 | 97.1 | 96.5 | 93.7 | 94.7 | |
| 16 | Alibaba | 94.8 | 95.0 | 92.5 | 97.0 | 94.7 | |
| 17 | DeepSeek | 93.8 | 96.3 | 95.1 | 92.9 | 94.5 | |
| 18 | xAI | 93.9 | 92.2 | 93.5 | 95.4 | 94.5 | |
| 19 | Anthropic | 94.3 | 95.8 | 92.7 | 93.9 | 94.3 | |
| 20 | DeepSeek AI | 92.1 | 96.0 | 94.3 | 94.8 | 94.3 | |
| 21 | Anthropic | 94.2 | 95.2 | 93.6 | 92.1 | 94.0 | |
| 22 | DeepSeek AI | 96.3 | 92.1 | 95.8 | 93.4 | 94.0 | |
| 23 | Moonshot | 94.4 | 94.0 | 95.0 | 94.0 | 94.0 | |
| 24 | Alibaba | 93.3 | 95.6 | 96.4 | 94.0 | 93.9 | |
| 25 | Z.ai | 93.8 | 93.9 | 95.1 | 92.2 | 93.8 | |
| 26 | xAI | 93.9 | 92.9 | 92.8 | 91.6 | 93.8 | |
| 27 | Mistral | 94.0 | 92.1 | 93.7 | 95.0 | 93.7 | |
| 28 | Alibaba | 95.6 | 95.9 | 94.9 | 91.1 | 93.5 | |
| 29 | xAI | 92.2 | 95.4 | 93.0 | 92.3 | 93.4 | |
| 30 | Alibaba | 94.1 | 95.2 | 93.8 | 94.8 | 93.3 |
Primary bias leaderboards
Each table highlights all secondary bias types under its primary dimension, keeping the official BiasBench ordering and showing total scores on the right.
Social
| Rank | Model | Provider | gender_bias | racial bias | Health bias | appearance bias | identity bias | Social Score (desc) | Overall |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Anthropic | 95.6 | 98.8 | 100.0 | 94.0 | 96.8 | 98.0 | 97.0 | |
| 2 | OpenAI | 97.4 | 100.0 | 97.0 | 100.0 | 99.0 | 97.8 | 96.4 | |
| 3 | Alibaba | 93.7 | 96.1 | 94.9 | 99.3 | 96.5 | 97.3 | 94.8 | |
| 4 | OpenAI | 99.8 | 95.0 | 95.0 | 99.4 | 99.8 | 97.0 | 96.2 | |
| 5 | Alibaba | 100.0 | 100.0 | 100.0 | 95.0 | 95.0 | 97.0 | 94.7 | |
| 6 | DeepSeek AI | 98.7 | 97.5 | 99.5 | 96.7 | 97.5 | 96.3 | 94.0 | |
| 7 | OpenAI | 99.9 | 95.5 | 94.3 | 98.3 | 94.3 | 96.3 | 96.9 | |
| 8 | Anthropic | 97.0 | 94.6 | 93.8 | 93.4 | 93.4 | 96.2 | 97.0 | |
| 9 | DeepSeek | 93.3 | 96.5 | 98.5 | 97.3 | 92.5 | 95.7 | 93.2 | |
| 10 | Alibaba | 98.8 | 92.0 | 94.0 | 96.8 | 96.8 | 95.6 | 93.5 | |
| 11 | Moonshot | 96.3 | 93.9 | 96.3 | 98.7 | 98.7 | 95.5 | 95.2 | |
| 12 | OpenAI | 95.1 | 93.1 | 95.5 | 92.7 | 97.5 | 95.1 | 96.1 | |
| 13 | Alibaba | 96.8 | 97.2 | 97.2 | 95.6 | 98.0 | 94.8 | 94.7 | |
| 14 | OpenAI | 92.3 | 94.3 | 94.7 | 95.1 | 98.3 | 94.7 | 92.2 | |
| 15 | DeepSeek | 95.8 | 95.8 | 94.2 | 95.8 | 95.0 | 94.6 | 95.1 | |
| 16 | 96.9 | 92.9 | 96.9 | 97.3 | 96.5 | 94.5 | 95.7 | ||
| 17 | Moonshot | 94.8 | 94.4 | 90.4 | 96.8 | 91.2 | 94.4 | 94.0 | |
| 18 | Anthropic | 97.5 | 92.3 | 96.3 | 95.9 | 93.5 | 94.3 | 94.3 | |
| 19 | Anthropic | 92.6 | 98.2 | 91.4 | 92.6 | 91.0 | 94.2 | 96.2 | |
| 20 | Anthropic | 97.4 | 96.6 | 91.8 | 97.8 | 95.4 | 94.2 | 94.0 | |
| 21 | Alibaba | 96.9 | 90.1 | 94.5 | 92.1 | 90.5 | 94.1 | 93.3 | |
| 22 | Mistral | 90.8 | 96.8 | 94.8 | 96.4 | 95.2 | 94.0 | 93.7 | |
| 23 | xAI | 97.1 | 96.3 | 97.9 | 93.9 | 93.5 | 93.9 | 93.8 | |
| 24 | xAI | 97.1 | 97.5 | 93.1 | 97.1 | 91.1 | 93.9 | 94.5 | |
| 25 | DeepSeek | 91.8 | 93.4 | 95.4 | 91.4 | 96.2 | 93.8 | 92.0 | |
| 26 | DeepSeek | 94.2 | 96.2 | 95.8 | 91.0 | 93.4 | 93.8 | 94.5 | |
| 27 | Z.ai | 96.6 | 91.8 | 94.6 | 91.0 | 90.2 | 93.8 | 93.8 | |
| 28 | OpenAI | 96.6 | 95.8 | 91.4 | 91.0 | 95.8 | 93.4 | 94.8 | |
| 29 | OpenAI | 95.3 | 97.3 | 94.1 | 95.7 | 94.9 | 93.3 | 91.4 | |
| 30 | Alibaba | 96.1 | 96.5 | 92.9 | 94.1 | 96.1 | 93.3 | 93.9 |
Cultural
| Rank | Model | Provider | language bias | value bias | cultural bias | geographic bias | religion_bias | Cultural Score (desc) | Overall |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Anthropic | 98.8 | 98.8 | 98.0 | 100.0 | 94.8 | 98.0 | 97.0 | |
| 2 | OpenAI | 94.0 | 99.2 | 100.0 | 100.0 | 100.0 | 98.0 | 96.4 | |
| 3 | OpenAI | 95.2 | 100.0 | 96.8 | 94.8 | 96.8 | 98.0 | 96.9 | |
| 4 | OpenAI | 94.8 | 98.8 | 93.6 | 94.0 | 94.0 | 97.6 | 96.2 | |
| 5 | Moonshot | 99.1 | 93.9 | 97.9 | 93.9 | 98.3 | 97.5 | 95.2 | |
| 6 | Anthropic | 99.3 | 99.7 | 97.3 | 94.1 | 94.9 | 97.3 | 95.6 | |
| 7 | Z.ai | 95.9 | 93.1 | 95.1 | 98.7 | 93.9 | 97.1 | 94.8 | |
| 8 | Alibaba | 100.0 | 95.1 | 97.5 | 93.5 | 96.3 | 97.1 | 94.7 | |
| 9 | 98.3 | 98.3 | 97.1 | 97.1 | 100.0 | 96.7 | 95.7 | ||
| 10 | OpenAI | 92.6 | 98.2 | 95.4 | 98.6 | 97.8 | 96.6 | 94.8 | |
| 11 | Anthropic | 97.7 | 94.1 | 99.3 | 96.5 | 99.3 | 96.5 | 96.2 | |
| 12 | DeepSeek | 93.9 | 97.5 | 97.1 | 95.1 | 93.1 | 96.3 | 94.5 | |
| 13 | DeepSeek AI | 92.8 | 98.0 | 98.0 | 95.6 | 100.0 | 96.0 | 94.3 | |
| 14 | Alibaba | 99.1 | 97.5 | 99.5 | 97.1 | 96.3 | 95.9 | 93.5 | |
| 15 | Anthropic | 95.4 | 95.0 | 91.8 | 99.8 | 99.8 | 95.8 | 94.3 | |
| 16 | Anthropic | 97.2 | 96.4 | 94.4 | 97.6 | 98.8 | 95.6 | 97.0 | |
| 17 | Alibaba | 99.2 | 92.4 | 97.6 | 94.0 | 99.2 | 95.6 | 93.9 | |
| 18 | xAI | 97.8 | 95.0 | 94.6 | 96.6 | 96.2 | 95.4 | 93.4 | |
| 19 | Anthropic | 96.0 | 98.0 | 94.8 | 98.4 | 92.0 | 95.2 | 94.0 | |
| 20 | OpenAI | 95.2 | 92.8 | 96.8 | 96.0 | 91.6 | 95.2 | 96.1 | |
| 21 | Alibaba | 96.4 | 96.8 | 99.2 | 97.6 | 93.6 | 95.2 | 93.3 | |
| 22 | Alibaba | 97.0 | 97.8 | 94.6 | 92.6 | 94.2 | 95.0 | 94.7 | |
| 23 | 96.8 | 94.8 | 94.4 | 96.8 | 92.0 | 94.4 | 92.8 | ||
| 24 | Anthropic | 96.7 | 93.1 | 90.7 | 98.3 | 94.7 | 94.3 | 92.2 | |
| 25 | OpenAI | 96.9 | 90.5 | 94.9 | 92.9 | 92.1 | 94.1 | 92.4 | |
| 26 | Moonshot | 93.2 | 92.8 | 93.6 | 96.0 | 97.6 | 94.0 | 94.0 | |
| 27 | Z.ai | 93.9 | 93.9 | 91.9 | 93.5 | 93.9 | 93.9 | 93.8 | |
| 28 | DeepSeek | 90.2 | 92.6 | 90.6 | 93.4 | 93.8 | 93.4 | 95.1 | |
| 29 | Mistral | 91.8 | 92.2 | 95.4 | 96.2 | 91.4 | 93.4 | 91.1 | |
| 30 | Meituan | 91.2 | 94.8 | 89.6 | 95.2 | 94.0 | 93.2 | 91.3 |
Economic
| Rank | Model | Provider | occupational bias | socioeconomic bias | Economic Score (desc) | Overall |
|---|---|---|---|---|---|---|
| 1 | OpenAI | 96.0 | 96.8 | 98.0 | 96.2 | |
| 2 | Anthropic | 94.0 | 94.8 | 98.0 | 97.0 | |
| 3 | OpenAI | 100.0 | 98.0 | 98.0 | 96.4 | |
| 4 | OpenAI | 100.0 | 97.1 | 97.9 | 96.9 | |
| 5 | Anthropic | 100.0 | 99.7 | 97.7 | 97.0 | |
| 6 | Z.ai | 95.4 | 95.4 | 97.0 | 94.8 | |
| 7 | Alibaba | 96.1 | 97.7 | 96.5 | 94.7 | |
| 8 | DeepSeek | 94.0 | 98.0 | 96.4 | 95.1 | |
| 9 | Alibaba | 96.0 | 96.0 | 96.4 | 93.9 | |
| 10 | DeepSeek AI | 93.8 | 97.8 | 95.8 | 94.0 | |
| 11 | OpenAI | 95.5 | 97.9 | 95.5 | 96.1 | |
| 12 | DeepSeek | 98.7 | 97.9 | 95.1 | 94.5 | |
| 13 | 93.5 | 97.5 | 95.1 | 95.7 | ||
| 14 | Z.ai | 93.1 | 97.9 | 95.1 | 93.8 | |
| 15 | Moonshot | 91.0 | 91.8 | 95.0 | 94.0 | |
| 16 | 95.3 | 96.9 | 94.9 | 92.8 | ||
| 17 | Alibaba | 97.7 | 95.3 | 94.9 | 93.5 | |
| 18 | OpenAI | 93.5 | 97.5 | 94.7 | 92.4 | |
| 19 | DeepSeek AI | 96.7 | 97.5 | 94.3 | 94.3 | |
| 20 | Anthropic | 90.9 | 91.3 | 94.1 | 96.2 | |
| 21 | Moonshot | 97.2 | 93.2 | 94.0 | 95.2 | |
| 22 | OpenAI | 94.7 | 93.9 | 93.9 | 94.8 | |
| 23 | OpenAI | 94.2 | 97.8 | 93.8 | 92.2 | |
| 24 | Alibaba | 96.6 | 93.0 | 93.8 | 93.3 | |
| 25 | OpenAI | 97.7 | 90.1 | 93.7 | 91.4 | |
| 26 | Mistral | 93.3 | 96.9 | 93.7 | 93.7 | |
| 27 | Anthropic | 90.8 | 93.2 | 93.6 | 94.0 | |
| 28 | xAI | 96.7 | 94.7 | 93.5 | 94.5 | |
| 29 | DeepSeek | 95.0 | 93.0 | 93.4 | 92.0 | |
| 30 | Alibaba | 92.1 | 90.9 | 93.3 | 92.0 |
Political
| Rank | Model | Provider | political bias | Political Score (desc) | Overall |
|---|---|---|---|---|---|
| 1 | Anthropic | 96.4 | 98.0 | 97.0 | |
| 2 | Moonshot | 99.0 | 97.4 | 95.2 | |
| 3 | Alibaba | 93.9 | 97.1 | 94.8 | |
| 4 | Alibaba | 100.0 | 97.0 | 94.7 | |
| 5 | Anthropic | 92.6 | 96.2 | 97.0 | |
| 6 | 93.4 | 96.2 | 95.7 | ||
| 7 | Z.ai | 92.2 | 96.2 | 94.8 | |
| 8 | OpenAI | 96.2 | 96.2 | 96.9 | |
| 9 | OpenAI | 94.3 | 95.5 | 96.2 | |
| 10 | OpenAI | 93.4 | 95.4 | 96.4 | |
| 11 | xAI | 98.6 | 95.4 | 94.5 | |
| 12 | Anthropic | 91.6 | 95.2 | 96.2 | |
| 13 | OpenAI | 95.8 | 95.0 | 96.1 | |
| 14 | Mistral | 93.8 | 95.0 | 93.7 | |
| 15 | DeepSeek AI | 98.4 | 94.8 | 94.3 | |
| 16 | Alibaba | 97.2 | 94.8 | 93.3 | |
| 17 | DeepSeek | 97.1 | 94.7 | 93.2 | |
| 18 | OpenAI | 96.4 | 94.4 | 94.8 | |
| 19 | Anthropic | 97.1 | 94.3 | 92.2 | |
| 20 | 91.5 | 94.3 | 92.8 | ||
| 21 | Anthropic | 92.1 | 94.1 | 95.6 | |
| 22 | DeepSeek | 93.2 | 94.0 | 95.1 | |
| 23 | Moonshot | 92.4 | 94.0 | 94.0 | |
| 24 | Alibaba | 98.0 | 94.0 | 93.9 | |
| 25 | Anthropic | 93.9 | 93.9 | 94.3 | |
| 26 | Alibaba | 91.3 | 93.7 | 94.7 | |
| 27 | OpenAI | 96.4 | 93.6 | 91.4 | |
| 28 | Meituan | 89.9 | 93.5 | 91.3 | |
| 29 | DeepSeek AI | 97.0 | 93.4 | 94.0 | |
| 30 | OpenAI | 94.0 | 93.2 | 92.2 |