lm-evaluation-harness
lm-evaluation-harness copied to clipboard
Add AIME 2024 and LiveCodeBenchmark to the gold standard evaluation harness
First, thank you for creating and maintaining lm-evaluation-harness, which has become the gold standard for open-source LLM benchmarking. Its comprehensive coverage and reliability make it invaluable to the AI community. I'd like to request adding support for two important tasks that would further strengthen this excellent platform: (1) AIME 2024 (2) LiveCodeBenchmark
yes!
+1 Has anyone just created a custom task that they would like to share?
Hi! @Allen-labs @imagoodman-aa @radna0 I'll be happy to work on AIME and will submit a PR soon.
+1