lm-evaluation-harness icon indicating copy to clipboard operation
lm-evaluation-harness copied to clipboard

Evaluating Arc Easy Got NonMatchingSplitsSizesError

Open yuanyehome opened this issue 1 year ago • 3 comments

Hi, when evaluating with arc easy challenge with the script below:

lm-eval \
    --model hf \
    --model_args trust_remote_code=True,pretrained=$ckpt \ 
    --tasks arc_easy \
    --num_fewshot 0 \
    --device cuda:0 \
    --output_path "./eval_scripts/arc_easy.json"

I got datasets.utils.info_utils.NonMatchingSplitsSizesError. I've noticed that the arc_easy repo in Huggingface has been updated 5 days ago. Are there some problems with it? image

yuanyehome avatar Dec 26 '23 15:12 yuanyehome

I'm not getting this. Maybe try clearing the dataset in the cache and try again?

baberabb avatar Dec 26 '23 15:12 baberabb

I updated my datasets package from 2.12.0 to 2.16.0 and the issue disappeared. Perhaps this should be added to the dependencies. Thanks anyway.

yuanyehome avatar Dec 26 '23 15:12 yuanyehome

We may need to pin datasets to >= 2.16.0 or 2.17.0 following #1135 .

Probably other HF datasets will also be migrated in the backend in a similar way. It shouldn’t change their contents though, HF is just phasing out dataset loading scripts.

haileyschoelkopf avatar Dec 26 '23 16:12 haileyschoelkopf

This dataset used to be defined using a dataset script and we recently converted it to Parquet to enable the datasets security features. However because of this change datasets 2.14 is now needed to load this dataset, sorry for the inconvenience.

Alternatively it's possible to load the old version of this dataset (with the dataset script) by specifying its old revision before the change, see at https://huggingface.co/datasets/allenai/ai2_arc/commits/main)

lhoestq avatar Jan 18 '24 16:01 lhoestq