datasets icon indicating copy to clipboard operation
datasets copied to clipboard

load_dataset with data_dir and cache_dir set fail with not supported

Open fah opened this issue 9 months ago • 0 comments

Describe the bug

with python 3.11 I execute:

from transformers import Wav2Vec2Processor, Data2VecAudioModel
import torch
from torch import nn
from datasets import load_dataset, concatenate_datasets

# load demo audio and set processor
dataset_clean = load_dataset("librispeech_asr", "clean", split="validation", data_dir="data", cache_dir="cache")

This fails in the last line with

Found cached dataset librispeech_asr (file:///Users/as/Documents/Project/git/audio2vec/cache/librispeech_asr/clean-data_dir=data/2.1.0/cff5df6e7955c80a67f80e27e7e655de71c689e2d2364bece785b972acb37fe7)
Traceback (most recent call last):
  File "/Users/as/Documents/Project/git/audio2vec/src/music2vec-v1.py", line 7, in <module>
    dataset_clean = load_dataset("librispeech_asr", "clean", split="validation", data_dir="data", cache_dir="cache")
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/as/anaconda3/lib/python3.11/site-packages/datasets/load.py", line 1810, in load_dataset
    ds = builder_instance.as_dataset(split=split, verification_mode=verification_mode, in_memory=keep_in_memory)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/as/anaconda3/lib/python3.11/site-packages/datasets/builder.py", line 1113, in as_dataset
    raise NotImplementedError(f"Loading a dataset cached in a {type(self._fs).__name__} is not supported.")
NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported.

Steps to reproduce the bug

I setup an venv with requirements.txt

transformers==4.40.2
torch==2.2.2
datasets==2.16.0
fsspec==2023.9.2

pip freeze is:

aiohttp==3.9.5
aiosignal==1.3.1
attrs==23.2.0
certifi==2024.2.2
charset-normalizer==3.3.2
datasets==2.16.0
dill==0.3.7
filelock==3.14.0
frozenlist==1.4.1
fsspec==2023.9.2
huggingface-hub==0.23.0
idna==3.7
Jinja2==3.1.4
MarkupSafe==2.1.5
mpmath==1.3.0
multidict==6.0.5
multiprocess==0.70.15
networkx==3.3
numpy==1.26.4
packaging==24.0
pandas==2.2.2
pyarrow==16.0.0
pyarrow-hotfix==0.6
python-dateutil==2.9.0.post0
pytz==2024.1
PyYAML==6.0.1
regex==2024.4.28
requests==2.31.0
safetensors==0.4.3
six==1.16.0
sympy==1.12
tokenizers==0.19.1
torch==2.2.2
tqdm==4.66.4
transformers==4.40.2
typing_extensions==4.11.0
tzdata==2024.1
urllib3==2.2.1
xxhash==3.4.1
yarl==1.9.4

I execute this on a M1 Mac.

Expected behavior

I don't understand the error message. Why is "local" caching not supported. Would it possible to give some additional hint with the error message how to solve this issue?

Environment info

source .... python -u example.py

fah avatar May 08 '24 19:05 fah