lm-evaluation-harness icon indicating copy to clipboard operation
lm-evaluation-harness copied to clipboard

fix(tasks):pin correct MMLUSR version

Open christinaexyou opened this issue 2 months ago • 1 comments

This PR addresses issue #3289 by pinning the correct HF dataset revision version to the task files.

It was tested by running the following command:

python3 -m lm_eval --model dummy --tasks mmlusr_answer_only_anatomy --limit 1

Output:

025-10-16:11:16:10 WARNING  [__main__:369]  --limit SHOULD ONLY BE USED FOR TESTING.REAL METRICS SHOULD NOT BE COMPUTED USING LIMIT.
2025-10-16:11:16:10 INFO     [__main__:450] Selected Tasks: ['mmlusr_answer_only_anatomy']
2025-10-16:11:16:10 INFO     [evaluator:202] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234
2025-10-16:11:16:10 INFO     [evaluator:240] Initializing dummy model, with arguments: {}
Using the latest cached version of the dataset since NiniCat/MMLU-SR couldn't be found on the Hugging Face Hub
2025-10-16:11:16:10 WARNING  [datasets.load:818] Using the latest cached version of the dataset since NiniCat/MMLU-SR couldn't be found on the Hugging Face Hub
Found the latest cached dataset configuration 'answer_only_anatomy' at /Users/chrxu/.cache/huggingface/datasets/NiniCat___mmlu-sr/answer_only_anatomy/0.0.0/a8b309f5a938a53b52bd11c3d163ace2d1b0a295ea0ee3087111cf0c45d91b74 (last modified on Wed Sep 10 09:47:20 2025).
2025-10-16:11:16:11 WARNING  [datasets.packaged_modules.cache.cache:94] Found the latest cached dataset configuration 'answer_only_anatomy' at /Users/chrxu/.cache/huggingface/datasets/NiniCat___mmlu-sr/answer_only_anatomy/0.0.0/a8b309f5a938a53b52bd11c3d163ace2d1b0a295ea0ee3087111cf0c45d91b74 (last modified on Wed Sep 10 09:47:20 2025).
Map: 100%|█████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 1085.54 examples/s]
Map: 100%|████████████████████████████████████████████████████████████| 135/135 [00:00<00:00, 17138.78 examples/s]
2025-10-16:11:16:11 INFO     [api.task:434] Building contexts for mmlusr_answer_only_anatomy on rank 0...
100%|█████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1107.26it/s]
2025-10-16:11:16:11 INFO     [evaluator:574] Running loglikelihood requests
100%|███████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 289262.34it/s]
2025-10-16:11:16:15 INFO     [loggers.evaluation_tracker:280] Output path not provided, skipping saving results aggregated
dummy (), gen_kwargs: (None), limit: 1.0, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot|Metric|   |Value|   |Stderr|
|-------|------:|------|-----:|------|---|----:|---|------|
|anatomy|      1|none  |     0|acc   |↑  |    0|±  |   N/A|

christinaexyou avatar Oct 16 '25 15:10 christinaexyou

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Oct 16 '25 15:10 CLAassistant

LGTM!

baberabb avatar Nov 19 '25 02:11 baberabb