datasets
datasets copied to clipboard
tensorflow_datasets v4.9.4 introduces bug that prevents loading datasets
Short description When upgrading to the most recent tensorflow_datasets==4.9.4 I am getting errors for loading datasets (from the official TFDS catalogue). I have verified that the same datasets can load in version 4.9.3 without problem.
Environment information
-
Operating System: verified on Colab -- https://colab.research.google.com/drive/1neCJ3_TnF1tqr8qv4FM5__-v4SwVOOxJ?usp=sharing
-
tensorflow-datasets
/tfds-nightly
version: 4.9.4 -
Does the issue still exists with the last
tfds-nightly
package (pip install --upgrade tfds-nightly
) ? -- Yes! (see Colab)
Reproduction instructions
import tensorflow_datasets as tfds
ds = tfds.load("fractal20220817_data", data_dir="gs://gresearch/robotics")
OR colab: https://colab.research.google.com/drive/1neCJ3_TnF1tqr8qv4FM5__-v4SwVOOxJ?usp=sharing
Link to logs
FileNotFoundError Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/dataset_info.py](https://localhost:8080/#) in read_from_json(path)
1033 try:
-> 1034 json_str = epath.Path(path).read_text()
1035 except OSError as e:
27 frames
FileNotFoundError: [Errno 2] No such file or directory: 'fractal20220817_data/0.1.0/dataset_info.json'
The above exception was the direct cause of the following exception:
FileNotFoundError Traceback (most recent call last)
FileNotFoundError: Could not load dataset info from fractal20220817_data/0.1.0/dataset_info.json
The above exception was the direct cause of the following exception:
FileNotFoundError Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/utils/py_utils.py](https://localhost:8080/#) in reraise(e, prefix, suffix)
383 else:
384 exception = RuntimeError(f'{type(e).__name__}: {msg}')
--> 385 raise exception from e
386 # Otherwise, modify the exception in-place
387 elif len(e.args) <= 1:
FileNotFoundError: Failed to construct dataset "fractal20220817_data", builder_kwargs "{'data_dir': 'gs://gresearch/robotics'}": Could not load dataset info from fractal20220817_data/0.1.0/dataset_info.json
Additional context
Interestingly, constructing a builder_from_directory still seems to work even in the most recent tfds version.
builder = tfds.builder_from_directory("gs://gresearch/robotics/fractal20220817_data/0.1.0")
Thanks for your detailed bug report!
This is caused by that _GCS_BUCKET was made empty in this commit: https://github.com/tensorflow/datasets/commit/b78fc27c4f830c590c28002b1a1d07ef14e588dc
I'll contact the people who changed it, but with the holidays I don't know how quickly they'll respond.
In the meantime you can also load it by specifying the version:
ds = tfds.load("fractal20220817_data:0.1.0", data_dir="gs://gresearch/robotics")
A fix was submitted. Could you test with tfds nightly if it now works?
Thanks for your detailed bug report!
This is caused by that _GCS_BUCKET was made empty in this commit: b78fc27
I'll contact the people who changed it, but with the holidays I don't know how quickly they'll respond.
In the meantime you can also load it by specifying the version:
ds = tfds.load("fractal20220817_data:0.1.0", data_dir="gs://gresearch/robotics")
Hey, I am facing on the same issue, I tried the recommended line on Jupyter Notebook:
ds = tfds.load("fractal20220817_data:0.1.0", data_dir="gs://gresearch/robotics")
But still not work, and I got:
UnimplementedError: File system scheme 'gs' not implemented (file: 'gs://gresearch/robotics/fractal20220817_data/0.1.0/features.json')
And the same line in Colab, it doesn't raise errors,
but I got a stupid question which is: How could I down the dataset (for example, fractal20220817_data) to my local PC, plz?
Thx a lot!
@tomvdw or can we continue using the tfds.builder_from_directory workaround for loading datasets from the specified directory...?