lm-evaluation-harness
lm-evaluation-harness copied to clipboard
fix(tasks):pin correct MMLUSR version
This PR addresses issue #3289 by pinning the correct HF dataset revision version to the task files.
It was tested by running the following command:
python3 -m lm_eval --model dummy --tasks mmlusr_answer_only_anatomy --limit 1
Output:
025-10-16:11:16:10 WARNING [__main__:369] --limit SHOULD ONLY BE USED FOR TESTING.REAL METRICS SHOULD NOT BE COMPUTED USING LIMIT.
2025-10-16:11:16:10 INFO [__main__:450] Selected Tasks: ['mmlusr_answer_only_anatomy']
2025-10-16:11:16:10 INFO [evaluator:202] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234
2025-10-16:11:16:10 INFO [evaluator:240] Initializing dummy model, with arguments: {}
Using the latest cached version of the dataset since NiniCat/MMLU-SR couldn't be found on the Hugging Face Hub
2025-10-16:11:16:10 WARNING [datasets.load:818] Using the latest cached version of the dataset since NiniCat/MMLU-SR couldn't be found on the Hugging Face Hub
Found the latest cached dataset configuration 'answer_only_anatomy' at /Users/chrxu/.cache/huggingface/datasets/NiniCat___mmlu-sr/answer_only_anatomy/0.0.0/a8b309f5a938a53b52bd11c3d163ace2d1b0a295ea0ee3087111cf0c45d91b74 (last modified on Wed Sep 10 09:47:20 2025).
2025-10-16:11:16:11 WARNING [datasets.packaged_modules.cache.cache:94] Found the latest cached dataset configuration 'answer_only_anatomy' at /Users/chrxu/.cache/huggingface/datasets/NiniCat___mmlu-sr/answer_only_anatomy/0.0.0/a8b309f5a938a53b52bd11c3d163ace2d1b0a295ea0ee3087111cf0c45d91b74 (last modified on Wed Sep 10 09:47:20 2025).
Map: 100%|█████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 1085.54 examples/s]
Map: 100%|████████████████████████████████████████████████████████████| 135/135 [00:00<00:00, 17138.78 examples/s]
2025-10-16:11:16:11 INFO [api.task:434] Building contexts for mmlusr_answer_only_anatomy on rank 0...
100%|█████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1107.26it/s]
2025-10-16:11:16:11 INFO [evaluator:574] Running loglikelihood requests
100%|███████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 289262.34it/s]
2025-10-16:11:16:15 INFO [loggers.evaluation_tracker:280] Output path not provided, skipping saving results aggregated
dummy (), gen_kwargs: (None), limit: 1.0, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot|Metric| |Value| |Stderr|
|-------|------:|------|-----:|------|---|----:|---|------|
|anatomy| 1|none | 0|acc |↑ | 0|± | N/A|
LGTM!