datasets icon indicating copy to clipboard operation
datasets copied to clipboard

Dataset Viewer issue for Jean-Baptiste/wikiner_fr

Open severo opened this issue 2 years ago • 1 comments

Link

https://huggingface.co/datasets/Jean-Baptiste/wikiner_fr

Description

Error code:   StreamingRowsError
Exception:    FileNotFoundError
Message:      [Errno 2] No such file or directory: 'zip:/data/train::https:/huggingface.co/datasets/Jean-Baptiste/wikiner_fr/resolve/main/data.zip/state.json'
Traceback:    Traceback (most recent call last):
               File "/src/services/worker/src/worker/responses/first_rows.py", line 337, in get_first_rows_response
                 rows = get_rows(dataset, config, split, streaming=True, rows_max_number=rows_max_number, hf_token=hf_token)
               File "/src/services/worker/src/worker/utils.py", line 123, in decorator
                 return func(*args, **kwargs)
               File "/src/services/worker/src/worker/responses/first_rows.py", line 77, in get_rows
                 rows_plus_one = list(itertools.islice(ds, rows_max_number + 1))
               File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 718, in __iter__
                 for key, example in self._iter():
               File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 708, in _iter
                 yield from ex_iterable
               File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 112, in __iter__
                 yield from self.generate_examples_fn(**self.kwargs)
               File "/tmp/modules-cache/datasets_modules/datasets/Jean-Baptiste--wikiner_fr/683a580ba6ec769d508f7dfc603a651667b0ed3817b1ae5bfd45f97cc024923f/wikiner_fr.py", line 165, in _generate_examples
                 dataset = Dataset.load_from_disk(filepath)
               File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 1210, in load_from_disk
                 with open(Path(dataset_path, config.DATASET_STATE_JSON_FILENAME).as_posix(), encoding="utf-8") as state_file:
             FileNotFoundError: [Errno 2] No such file or directory: 'zip:/data/train::https:/huggingface.co/datasets/Jean-Baptiste/wikiner_fr/resolve/main/data.zip/state.json'

Is it an error with the dataset script, or the data itself, @huggingface/datasets?

https://huggingface.co/datasets/Jean-Baptiste/wikiner_fr/tree/main

Owner

No

severo avatar Sep 20 '22 12:09 severo

The script uses Dataset.load_from_disk, which as you can expect, doesn't work in streaming mode.

It would probably be more practical to load the dataset locally using Dataset.load_from_disk first and then push_to_hub to upload it in Parquet on the Hub

lhoestq avatar Sep 20 '22 12:09 lhoestq

I've transferred this issue to the Hub repo: https://huggingface.co/datasets/Jean-Baptiste/wikiner_fr/discussions/3

I'm closing this.

albertvillanova avatar Sep 27 '22 12:09 albertvillanova