datasets icon indicating copy to clipboard operation
datasets copied to clipboard

Dataset viewer issue for *P3*

Open jeffistyping opened this issue 3 years ago • 3 comments

Dataset viewer issue for 'P3'

Link: https://huggingface.co/datasets/bigscience/P3

Status code:   400
Exception:     SplitsNotFoundError
Message:       The split names could not be parsed from the dataset config.

Am I the one who added this dataset ? No

jeffistyping avatar Feb 01 '22 15:02 jeffistyping

The error is now:

Status code:   400
Exception:     Status400Error
Message:       this dataset is not supported for now.

We've disabled the dataset viewer for several big datasets like this one. We hope being able to reenable it soon.

severo avatar Apr 12 '22 12:04 severo

The list of splits cannot be obtained. cc @huggingface/datasets

severo avatar Sep 08 '22 08:09 severo

Error code:   SplitsNamesError
Exception:    SplitsNotFoundError
Message:      The split names could not be parsed from the dataset config.
Traceback:    Traceback (most recent call last):
               File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/inspect.py", line 354, in get_dataset_config_info
                 for split_generator in builder._split_generators(
               File "/tmp/modules-cache/datasets_modules/datasets/bigscience--P3/12c0badfecad4564ecb8a6f81b5d0559656f269f08b13c59c93283f3a84134ba/P3.py", line 154, in _split_generators
                 data_dir = dl_manager.download_and_extract(_URLs)
               File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/download/streaming_download_manager.py", line 944, in download_and_extract
                 return self.extract(self.download(url_or_urls))
               File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/download/streaming_download_manager.py", line 907, in extract
                 urlpaths = map_nested(self._extract, path_or_paths, map_tuple=True)
               File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 393, in map_nested
                 mapped = [
               File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 394, in <listcomp>
                 _single_map_nested((function, obj, types, None, True, None))
               File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 346, in _single_map_nested
                 return {k: _single_map_nested((function, v, types, None, True, None)) for k, v in pbar}
               File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 346, in <dictcomp>
                 return {k: _single_map_nested((function, v, types, None, True, None)) for k, v in pbar}
               File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 346, in _single_map_nested
                 return {k: _single_map_nested((function, v, types, None, True, None)) for k, v in pbar}
               File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 346, in <dictcomp>
                 return {k: _single_map_nested((function, v, types, None, True, None)) for k, v in pbar}
               File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 330, in _single_map_nested
                 return function(data_struct)
               File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/download/streaming_download_manager.py", line 912, in _extract
                 protocol = _get_extraction_protocol(urlpath, use_auth_token=self.download_config.use_auth_token)
               File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/download/streaming_download_manager.py", line 402, in _get_extraction_protocol
                 return _get_extraction_protocol_with_magic_number(f)
               File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/download/streaming_download_manager.py", line 367, in _get_extraction_protocol_with_magic_number
                 magic_number = f.read(MAGIC_NUMBER_MAX_LENGTH)
               File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 574, in read
                 return super().read(length)
               File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/spec.py", line 1575, in read
                 out = self.cache._fetch(self.loc, self.loc + length)
               File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/caching.py", line 377, in _fetch
                 self.cache = self.fetcher(start, bend)
               File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 111, in wrapper
                 return sync(self.loop, func, *args, **kwargs)
               File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 96, in sync
                 raise return_result
               File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 53, in _runner
                 result[0] = await coro
               File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 616, in async_fetch_range
                 out = await r.read()
               File "/src/services/worker/.venv/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 1036, in read
                 self._body = await self.content.read()
               File "/src/services/worker/.venv/lib/python3.9/site-packages/aiohttp/streams.py", line 375, in read
                 block = await self.readany()
               File "/src/services/worker/.venv/lib/python3.9/site-packages/aiohttp/streams.py", line 397, in readany
                 await self._wait("readany")
               File "/src/services/worker/.venv/lib/python3.9/site-packages/aiohttp/streams.py", line 304, in _wait
                 await waiter
             aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed
             
             The above exception was the direct cause of the following exception:
             
             Traceback (most recent call last):
               File "/src/services/worker/src/worker/responses/splits.py", line 75, in get_splits_response
                 split_full_names = get_dataset_split_full_names(dataset, hf_token)
               File "/src/services/worker/src/worker/responses/splits.py", line 35, in get_dataset_split_full_names
                 return [
               File "/src/services/worker/src/worker/responses/splits.py", line 38, in <listcomp>
                 for split in get_dataset_split_names(dataset, config, use_auth_token=hf_token)
               File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/inspect.py", line 404, in get_dataset_split_names
                 info = get_dataset_config_info(
               File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/inspect.py", line 359, in get_dataset_config_info
                 raise SplitsNotFoundError("The split names could not be parsed from the dataset config.") from err
             datasets.inspect.SplitsNotFoundError: The split names could not be parsed from the dataset config.

severo avatar Sep 08 '22 08:09 severo

Closing in favor of https://huggingface.co/datasets/bigscience/P3/discussions/6 and https://github.com/huggingface/datasets-server/issues/1689

severo avatar Sep 25 '23 12:09 severo