Frontier LLM Comparison

All major frontier large-language models, ranked by Arena Elo, context window, and output speed — across the labs that ship them

Models

41

Avg Elo

1369

Avg Context

800K

Avg Speed (t/s)

5.2K

RankModelArena Elo Context HQ
🥇
Claude Opus 4.7
Claude Opus 4.7
1510200K🇺🇸
🥈
Claude Opus 4.6 (Thinking)
Claude Opus 4.6 (思考模式)
1504200K🇺🇸
🥉
GPT-5.5
GPT-5.5
1495400K🇺🇸
4
Gemini 3.1 Pro
Gemini 3.1 Pro
14931000K🇺🇸
5
Grok 4.20
Grok 4.20
1491256K🇺🇸
6
GPT-5.4 (High)
GPT-5.4 (高推理)
1484400K🇺🇸
7
Claude Sonnet 4.6
Claude Sonnet 4.6
14651000K🇺🇸
8
GLM-5.1
GLM-5.1
1465128K🇨🇳
9
ERNIE 5.0
文心一言 5.0
1460128K🇨🇳
10
DeepSeek V4 Pro
DeepSeek V4 Pro
14551000K🇨🇳
11
Gemini 3 Pro
Gemini 3 Pro
14501000K🇺🇸
12
Qwen 3.6-Max-Preview
通义千问 3.6-Max-Preview
1448256K🇨🇳
13
Kimi K2.6
Kimi K2.6
1442256K🇨🇳
14
GPT-5.2
GPT-5.2
1430400K🇺🇸
15
Claude Sonnet 4.5 (1M)
Claude Sonnet 4.5 (1M)
14201000K🇺🇸
16
o3
o3
1418200K🇺🇸
17
Llama 5
Llama 5
14085000K🇺🇸
18
Mistral Large 3
Mistral Large 3
1395128K🇫🇷
19
Gemini 3.1 Flash
Gemini 3.1 Flash
13781000K🇺🇸
20
DeepSeek V4 Flash
DeepSeek V4 Flash
13701000K🇨🇳
21
GPT-4.1
GPT-4.1
13651000K🇺🇸
22
DeepSeek V3.2
DeepSeek V3.2
1355128K🇨🇳
23
Hunyuan 3.0
混元 3.0
1352256K🇨🇳
24
ByteDance Seed 2.0 Pro
豆包 Seed 2.0 Pro
1340256K🇨🇳
25
Llama 4 Maverick
Llama 4 Maverick
13351000K🇺🇸
26
Qwen3 Max
通义千问 3 Max
1330262K🇨🇳
27
Gemini 3.1 Flash-Lite
Gemini 3.1 Flash-Lite
13181000K🇺🇸
28
Claude Haiku 4.6
Claude Haiku 4.6
1310200K🇺🇸
29
GPT-5.5 mini
GPT-5.5 mini
1305200K🇺🇸
30
o3-mini
o3-mini
1295200K🇺🇸
31
Llama 4 Scout
Llama 4 Scout
128010000K🇺🇸
32
DeepSeek R1
DeepSeek R1
1275128K🇨🇳
33
DeepSeek-Coder V4
DeepSeek-Coder V4
1268128K🇨🇳
34
Mistral 3 (14B)
Mistral 3 (14B)
1265128K🇫🇷
35
Qwen3-Coder Plus
通义千问3 Coder Plus
12601000K🇨🇳
36
Yi-Large 2
Yi-Large 2
1255200K🇨🇳
37
Cohere Command A
Cohere Command A
1255256K🇨🇦
38
Hunyuan HY 2.0 Think
混元 HY 2.0 Think
1252128K🇨🇳
39
MiniMax M2.7
MiniMax M2.7
12451000K🇨🇳
40
Hermes 4
Hermes 4
1240128K🇺🇸
41
AI21 Jamba 2
AI21 Jamba 2
1230256K🇮🇱