lm-evaluation-harness Evaluating Arc Easy Got NonMatchingSplitsSizesError

Evaluating Arc Easy Got NonMatchingSplitsSizesError

Open yuanyehome opened this issue 1 year ago • 3 comments

Hi, when evaluating with arc easy challenge with the script below:

lm-eval \
    --model hf \
    --model_args trust_remote_code=True,pretrained=$ckpt \ 
    --tasks arc_easy \
    --num_fewshot 0 \
    --device cuda:0 \
    --output_path "./eval_scripts/arc_easy.json"

I got datasets.utils.info_utils.NonMatchingSplitsSizesError. I've noticed that the arc_easy repo in Huggingface has been updated 5 days ago. Are there some problems with it?

Dec 26 '23 15:12 yuanyehome

I'm not getting this. Maybe try clearing the dataset in the cache and try again?

Dec 26 '23 15:12 baberabb

I updated my datasets package from 2.12.0 to 2.16.0 and the issue disappeared. Perhaps this should be added to the dependencies. Thanks anyway.

Dec 26 '23 15:12 yuanyehome

We may need to pin datasets to >= 2.16.0 or 2.17.0 following #1135 .

Probably other HF datasets will also be migrated in the backend in a similar way. It shouldn’t change their contents though, HF is just phasing out dataset loading scripts.

Dec 26 '23 16:12 haileyschoelkopf

This dataset used to be defined using a dataset script and we recently converted it to Parquet to enable the datasets security features. However because of this change datasets 2.14 is now needed to load this dataset, sorry for the inconvenience.

Alternatively it's possible to load the old version of this dataset (with the dataset script) by specifying its old revision before the change, see at https://huggingface.co/datasets/allenai/ai2_arc/commits/main)

Jan 18 '24 16:01 lhoestq

lm-evaluation-harness lm-evaluation-harness copied to clipboard

Evaluating Arc Easy Got NonMatchingSplitsSizesError

lm-evaluation-harness
lm-evaluation-harness copied to clipboard