lm-evaluation-harness
lm-evaluation-harness copied to clipboard
Evaluating Arc Easy Got NonMatchingSplitsSizesError
Hi, when evaluating with arc easy challenge with the script below:
lm-eval \
--model hf \
--model_args trust_remote_code=True,pretrained=$ckpt \
--tasks arc_easy \
--num_fewshot 0 \
--device cuda:0 \
--output_path "./eval_scripts/arc_easy.json"
I got datasets.utils.info_utils.NonMatchingSplitsSizesError
.
I've noticed that the arc_easy repo in Huggingface has been updated 5 days ago. Are there some problems with it?
I'm not getting this. Maybe try clearing the dataset in the cache and try again?
I updated my datasets
package from 2.12.0 to 2.16.0 and the issue disappeared. Perhaps this should be added to the dependencies. Thanks anyway.
We may need to pin datasets to >= 2.16.0 or 2.17.0 following #1135 .
Probably other HF datasets will also be migrated in the backend in a similar way. It shouldn’t change their contents though, HF is just phasing out dataset loading scripts.
This dataset used to be defined using a dataset script and we recently converted it to Parquet to enable the datasets
security features. However because of this change datasets
2.14 is now needed to load this dataset, sorry for the inconvenience.
Alternatively it's possible to load the old version of this dataset (with the dataset script) by specifying its old revision before the change, see at https://huggingface.co/datasets/allenai/ai2_arc/commits/main)