lm-evaluation-harness icon indicating copy to clipboard operation
lm-evaluation-harness copied to clipboard

Add AIME 2024 and LiveCodeBenchmark to the gold standard evaluation harness

Open Allen-labs opened this issue 8 months ago • 2 comments

First, thank you for creating and maintaining lm-evaluation-harness, which has become the gold standard for open-source LLM benchmarking. Its comprehensive coverage and reliability make it invaluable to the AI community. I'd like to request adding support for two important tasks that would further strengthen this excellent platform: (1) AIME 2024 (2) LiveCodeBenchmark

Allen-labs avatar Mar 06 '25 07:03 Allen-labs

yes!

imagoodman-aa avatar Mar 06 '25 08:03 imagoodman-aa

+1 Has anyone just created a custom task that they would like to share?

radna0 avatar Mar 09 '25 21:03 radna0

Hi! @Allen-labs @imagoodman-aa @radna0 I'll be happy to work on AIME and will submit a PR soon.

Zephyr271828 avatar Apr 08 '25 15:04 Zephyr271828

+1

fxmarty-amd avatar Jul 10 '25 15:07 fxmarty-amd