datasets
datasets copied to clipboard
Dataset viewer issue for *P3*
Dataset viewer issue for 'P3'
Link: https://huggingface.co/datasets/bigscience/P3
Status code: 400
Exception: SplitsNotFoundError
Message: The split names could not be parsed from the dataset config.
Am I the one who added this dataset ? No
The error is now:
Status code: 400
Exception: Status400Error
Message: this dataset is not supported for now.
We've disabled the dataset viewer for several big datasets like this one. We hope being able to reenable it soon.
The list of splits cannot be obtained. cc @huggingface/datasets
Error code: SplitsNamesError
Exception: SplitsNotFoundError
Message: The split names could not be parsed from the dataset config.
Traceback: Traceback (most recent call last):
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/inspect.py", line 354, in get_dataset_config_info
for split_generator in builder._split_generators(
File "/tmp/modules-cache/datasets_modules/datasets/bigscience--P3/12c0badfecad4564ecb8a6f81b5d0559656f269f08b13c59c93283f3a84134ba/P3.py", line 154, in _split_generators
data_dir = dl_manager.download_and_extract(_URLs)
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/download/streaming_download_manager.py", line 944, in download_and_extract
return self.extract(self.download(url_or_urls))
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/download/streaming_download_manager.py", line 907, in extract
urlpaths = map_nested(self._extract, path_or_paths, map_tuple=True)
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 393, in map_nested
mapped = [
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 394, in <listcomp>
_single_map_nested((function, obj, types, None, True, None))
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 346, in _single_map_nested
return {k: _single_map_nested((function, v, types, None, True, None)) for k, v in pbar}
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 346, in <dictcomp>
return {k: _single_map_nested((function, v, types, None, True, None)) for k, v in pbar}
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 346, in _single_map_nested
return {k: _single_map_nested((function, v, types, None, True, None)) for k, v in pbar}
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 346, in <dictcomp>
return {k: _single_map_nested((function, v, types, None, True, None)) for k, v in pbar}
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 330, in _single_map_nested
return function(data_struct)
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/download/streaming_download_manager.py", line 912, in _extract
protocol = _get_extraction_protocol(urlpath, use_auth_token=self.download_config.use_auth_token)
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/download/streaming_download_manager.py", line 402, in _get_extraction_protocol
return _get_extraction_protocol_with_magic_number(f)
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/download/streaming_download_manager.py", line 367, in _get_extraction_protocol_with_magic_number
magic_number = f.read(MAGIC_NUMBER_MAX_LENGTH)
File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 574, in read
return super().read(length)
File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/spec.py", line 1575, in read
out = self.cache._fetch(self.loc, self.loc + length)
File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/caching.py", line 377, in _fetch
self.cache = self.fetcher(start, bend)
File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 111, in wrapper
return sync(self.loop, func, *args, **kwargs)
File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 96, in sync
raise return_result
File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 53, in _runner
result[0] = await coro
File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 616, in async_fetch_range
out = await r.read()
File "/src/services/worker/.venv/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 1036, in read
self._body = await self.content.read()
File "/src/services/worker/.venv/lib/python3.9/site-packages/aiohttp/streams.py", line 375, in read
block = await self.readany()
File "/src/services/worker/.venv/lib/python3.9/site-packages/aiohttp/streams.py", line 397, in readany
await self._wait("readany")
File "/src/services/worker/.venv/lib/python3.9/site-packages/aiohttp/streams.py", line 304, in _wait
await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/src/services/worker/src/worker/responses/splits.py", line 75, in get_splits_response
split_full_names = get_dataset_split_full_names(dataset, hf_token)
File "/src/services/worker/src/worker/responses/splits.py", line 35, in get_dataset_split_full_names
return [
File "/src/services/worker/src/worker/responses/splits.py", line 38, in <listcomp>
for split in get_dataset_split_names(dataset, config, use_auth_token=hf_token)
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/inspect.py", line 404, in get_dataset_split_names
info = get_dataset_config_info(
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/inspect.py", line 359, in get_dataset_config_info
raise SplitsNotFoundError("The split names could not be parsed from the dataset config.") from err
datasets.inspect.SplitsNotFoundError: The split names could not be parsed from the dataset config.
Closing in favor of https://huggingface.co/datasets/bigscience/P3/discussions/6 and https://github.com/huggingface/datasets-server/issues/1689