eval-dev-quality icon indicating copy to clipboard operation
eval-dev-quality copied to clipboard

Add Qwen3, GLM4.5, and DeepSeek series as an update

Open BradKML opened this issue 4 months ago • 0 comments

Those are not in the current leaderboard but things has already changed since last quarter

  • [ ] Qwen3 Coder series
  • [ ] Qwen3 dense and MoE models
  • [ ] Qwen3 distilled models
  • [ ] GLM 4.5 both original and Air models
  • [ ] DeepSeek v3.1 base
  • [ ] DeepSeek-R1-0528

Other suggestions

  • [ ] Kimi K2 and Kimi Coder is probably worth it even if they fail on some tasks
  • [ ] GPT-OSS model series are good to comare as US representation
  • [ ] maybe LG's Exaone, Meta's Llama 4, and MiniMax if there is enough time for this

BradKML avatar Aug 30 '25 09:08 BradKML