datasets Many things broken since the new 4.0.0 release

Describe the bug

The new changes in 4.0.0 are breaking many datasets, including those from lm-evaluation-harness. I am trying to revert back to older versions, like 3.6.0 to make the eval work but I keep getting:

File /venv/main/lib/python3.12/site-packages/datasets/features/features.py:1474, in generate_from_dict(obj)
   1471 class_type = _FEATURE_TYPES.get(_type, None) or globals().get(_type, None)
   1473 if class_type is None:
-> 1474     raise ValueError(f"Feature type '{_type}' not found. Available feature types: {list(_FEATURE_TYPES.keys())}")
   1476 if class_type == LargeList:
   1477     feature = obj.pop("feature")

ValueError: Feature type 'List' not found. Available feature types: ['Value', 'ClassLabel', 'Translation', 'TranslationVariableLanguages', 'LargeList', 'Sequence', 'Array2D', 'Array3D', 'Array4D', 'Array5D', 'Audio', 'Image', 'Video', 'Pdf']

Steps to reproduce the bug

import lm_eval
model_eval = lm_eval.models.huggingface.HFLM(pretrained=model, tokenizer=tokenizer)
lm_eval.evaluator.simple_evaluate(model_eval, tasks=["winogrande"], num_fewshot=5, batch_size=1)

Expected behavior

Older datasets versions should work just fine as before

Environment info

datasets version: 3.6.0
Platform: Linux-6.8.0-60-generic-x86_64-with-glibc2.39
Python version: 3.12.11
huggingface_hub version: 0.33.1
PyArrow version: 20.0.0
Pandas version: 2.3.1
fsspec version: 2025.3.0

Jul 09 '25 18:07 mobicham

Happy to take a look, do you have a list of impacted datasets ?

Jul 09 '25 23:07 lhoestq

Thanks @lhoestq , related to lm-eval, at least winogrande, mmlu and hellaswag, based on my tests yesterday. But many others like bbh, most probably others too.

Jul 10 '25 09:07 mobicham

Hi @mobicham ,

I was having the same issue ValueError: Feature type 'List' not found yesterday, when I tried to load my dataset using the load_dataset() function. By updating to 4.0.0, I don't see this error anymore.

p.s. I used Sequence in replace of list when building my dataset (see below)

features = Features({
    ...
    "objects": Sequence({
        "id": Value("int64"),
        "bbox": Sequence(Value("float32"), length=4),
        "category": Value("string")
    }),
    ...
})
dataset = Dataset.from_dict(data_dict)
dataset = dataset.cast(features)

Jul 10 '25 12:07 zhiying318

The issue comes from hails/mmlu_no_train, allenai/winogrande, lukaemon/bbh and Rowan/hellaswag which are all unsupported in datasets 4.0 since they are based on python scripts. Fortunately there are PRs to fix those datasets (I did some of them a year ago but dataset authors haven't merged yet... will have to ping people again about it and update here):

https://huggingface.co/datasets/hails/mmlu_no_train/discussions/2 merged ! ✅
https://huggingface.co/datasets/allenai/winogrande/discussions/6 merged ! ✅
https://huggingface.co/datasets/Rowan/hellaswag/discussions/7 merged ! ✅
https://huggingface.co/datasets/lukaemon/bbh/discussions/2 merged ! ✅

Jul 10 '25 13:07 lhoestq

Thank you very much @lhoestq , I will try next week 👍

Jul 10 '25 16:07 mobicham

I get this error when using datasets 3.5.1 to load a dataset saved with datasets 4.0.0. If you are hitting this issue, make sure that both dataset saving code and the loading code are <4.0.0 or >=4.0.0.

Jul 10 '25 23:07 jsternabsci

This broke several lm-eval-harness workflows for me and reverting to older versions of datasets is not fixing the issue, does anyone have a workaround?

Jul 11 '25 01:07 rawsh

I get this error when using datasets 3.5.1 to load a dataset saved with datasets 4.0.0. If you are hitting this issue, make sure that both dataset saving code and the loading code are <4.0.0 or >=4.0.0.

datasets 4.0 can load datasets saved using any older version. But the other way around is not always true: if you save a dataset with datasets 4.0 it may use the new List type that requires 4.0 and raise ValueError: Feature type 'List' not found.

However issues with lm eval harness seem to come from another issue: unsupported dataset scripts (see https://github.com/huggingface/datasets/issues/7676#issuecomment-3057550659)

This broke several lm-eval-harness workflows for me and reverting to older versions of datasets is not fixing the issue, does anyone have a workaround?

when reverting to an old datasets version I'd encourage you to clear your cache (by default it is located at ~/.cache/huggingface/datasets) otherwise it might try to load a List type that didn't exist in old versions

Jul 11 '25 15:07 lhoestq

All the impacted datasets in lm eval harness have been fixed thanks to the reactivity of dataset authors ! let me know if you encounter issues with other datasets :)

Jul 13 '25 15:07 lhoestq

Hello folks, I have found patrickvonplaten/librispeech_asr_dummy to be another dataset that is currently broken since the 4.0.0 release. Is there a PR on this as well?

Jul 17 '25 17:07 jonryuamazon

https://huggingface.co/datasets/microsoft/prototypical-hai-collaborations seems to be impacted as well.

_temp = load_dataset("microsoft/prototypical-hai-collaborations", "wildchat1m_en3u-task_anns")

leads to ValueError: Feature type 'List' not found. Available feature types: ['Value', 'ClassLabel', 'Translation', 'TranslationVariableLanguages', 'LargeList', 'Sequence', 'Array2D', 'Array3D', 'Array4D', 'Array5D', 'Audio', 'Image', 'Video', 'Pdf']

Jul 21 '25 09:07 krumeto

microsoft/prototypical-hai-collaborations is not impacted, you can load it using both datasets 3.6 and 4.0. I also tried on colab to confirm.

One thing that could explain ValueError: Feature type 'List' not found. is maybe if you have loaded and cached this dataset with datasets 4.0 and then tried to reload it from cache using 3.6.0.

EDIT: actually I tried and 3.6 can reload datasets cached with 4.0 so I'm not sure why you have this error. Which version of datasets are you using ?

Jul 21 '25 10:07 lhoestq

Hello folks, I have found patrickvonplaten/librispeech_asr_dummy to be another dataset that is currently broken since the 4.0.0 release. Is there a PR on this as well?

I guess you can use hf-internal-testing/librispeech_asr_dummy instead of patrickvonplaten/librispeech_asr_dummy, or ask the dataset author to convert their dataset to Parquet

Jul 21 '25 10:07 lhoestq

i am having a similar issue with these evals under leaderboard: https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks/leaderboard

some datasets look pretty old (2years), not sure if the author would fix it

Sep 17 '25 20:09 maziyarpanahi

For datasets based on scripts, I shared a command here to update them: https://github.com/huggingface/datasets/issues/7693#issuecomment-3253005348

Otherwise if you are getting ValueError: Feature type 'List' not found. as in the original post, make sure you use datasets v4 to reload datasets that were loaded with v4.

Sep 18 '25 16:09 lhoestq