datasets load_dataset ignores cached datasets and tries to hit HF Hub, resulting in API rate limit errors

Describe the bug

I have been running lm-eval-harness a lot which has results in an API rate limit. This seems strange, since all of the data should be cached locally. I have in fact verified this.

Steps to reproduce the bug

Be Me
Run load_dataset("TAUR-Lab/MuSR")
Hit rate limit error
Dataset is in .cache/huggingface/datasets
???

Expected behavior

We should not run into API rate limits if we have cached the dataset

Environment info

datasets 2.16.0 python 3.10.4

Aug 02 '24 18:08 tginart

I'm having the same issue - running into rate limits when doing hyperparameter tuning even though the dataset is supposed to be cached. I feel like this behaviour should at the very least be documented, but honestly you should just not be running into rate limits in the first place when the dataset is cached. It even happens when specifying a specific revision for the dataset, in which case AFAIK there should be no reason to be doing API requests if it's already cached (besides maybe a quick hash check but hitting rate limits for that in ~200 requests across 10 hours of use seems a bit ridiculous).

Jun 16 '25 18:06 luisgalan

I was running into the same issue and solved it by upgrading huggingface_hub from 0.33.5 to 0.34.0.

> python -m pip install huggingface_hub==0.34.0

Nov 21 '25 10:11 yunjae-won