datasets
datasets copied to clipboard
Dataset Viewer issue for Jean-Baptiste/wikiner_fr
Link
https://huggingface.co/datasets/Jean-Baptiste/wikiner_fr
Description
Error code: StreamingRowsError
Exception: FileNotFoundError
Message: [Errno 2] No such file or directory: 'zip:/data/train::https:/huggingface.co/datasets/Jean-Baptiste/wikiner_fr/resolve/main/data.zip/state.json'
Traceback: Traceback (most recent call last):
File "/src/services/worker/src/worker/responses/first_rows.py", line 337, in get_first_rows_response
rows = get_rows(dataset, config, split, streaming=True, rows_max_number=rows_max_number, hf_token=hf_token)
File "/src/services/worker/src/worker/utils.py", line 123, in decorator
return func(*args, **kwargs)
File "/src/services/worker/src/worker/responses/first_rows.py", line 77, in get_rows
rows_plus_one = list(itertools.islice(ds, rows_max_number + 1))
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 718, in __iter__
for key, example in self._iter():
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 708, in _iter
yield from ex_iterable
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 112, in __iter__
yield from self.generate_examples_fn(**self.kwargs)
File "/tmp/modules-cache/datasets_modules/datasets/Jean-Baptiste--wikiner_fr/683a580ba6ec769d508f7dfc603a651667b0ed3817b1ae5bfd45f97cc024923f/wikiner_fr.py", line 165, in _generate_examples
dataset = Dataset.load_from_disk(filepath)
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 1210, in load_from_disk
with open(Path(dataset_path, config.DATASET_STATE_JSON_FILENAME).as_posix(), encoding="utf-8") as state_file:
FileNotFoundError: [Errno 2] No such file or directory: 'zip:/data/train::https:/huggingface.co/datasets/Jean-Baptiste/wikiner_fr/resolve/main/data.zip/state.json'
Is it an error with the dataset script, or the data itself, @huggingface/datasets?
https://huggingface.co/datasets/Jean-Baptiste/wikiner_fr/tree/main
Owner
No
The script uses Dataset.load_from_disk
, which as you can expect, doesn't work in streaming mode.
It would probably be more practical to load the dataset locally using Dataset.load_from_disk
first and then push_to_hub
to upload it in Parquet on the Hub
I've transferred this issue to the Hub repo: https://huggingface.co/datasets/Jean-Baptiste/wikiner_fr/discussions/3
I'm closing this.