datasets icon indicating copy to clipboard operation
datasets copied to clipboard

tensorflow_datasets v4.9.4 introduces bug that prevents loading datasets

Open kpertsch opened this issue 1 year ago • 4 comments

Short description When upgrading to the most recent tensorflow_datasets==4.9.4 I am getting errors for loading datasets (from the official TFDS catalogue). I have verified that the same datasets can load in version 4.9.3 without problem.

Environment information

  • Operating System: verified on Colab -- https://colab.research.google.com/drive/1neCJ3_TnF1tqr8qv4FM5__-v4SwVOOxJ?usp=sharing

  • tensorflow-datasets/tfds-nightly version: 4.9.4

  • Does the issue still exists with the last tfds-nightly package (pip install --upgrade tfds-nightly) ? -- Yes! (see Colab)

Reproduction instructions

import tensorflow_datasets as tfds
ds = tfds.load("fractal20220817_data", data_dir="gs://gresearch/robotics")

OR colab: https://colab.research.google.com/drive/1neCJ3_TnF1tqr8qv4FM5__-v4SwVOOxJ?usp=sharing

Link to logs

FileNotFoundError                         Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/dataset_info.py](https://localhost:8080/#) in read_from_json(path)
   1033   try:
-> 1034     json_str = epath.Path(path).read_text()
   1035   except OSError as e:

27 frames
FileNotFoundError: [Errno 2] No such file or directory: 'fractal20220817_data/0.1.0/dataset_info.json'

The above exception was the direct cause of the following exception:

FileNotFoundError                         Traceback (most recent call last)
FileNotFoundError: Could not load dataset info from fractal20220817_data/0.1.0/dataset_info.json

The above exception was the direct cause of the following exception:

FileNotFoundError                         Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/utils/py_utils.py](https://localhost:8080/#) in reraise(e, prefix, suffix)
    383     else:
    384       exception = RuntimeError(f'{type(e).__name__}: {msg}')
--> 385     raise exception from e
    386   # Otherwise, modify the exception in-place
    387   elif len(e.args) <= 1:

FileNotFoundError: Failed to construct dataset "fractal20220817_data", builder_kwargs "{'data_dir': 'gs://gresearch/robotics'}": Could not load dataset info from fractal20220817_data/0.1.0/dataset_info.json

Additional context Interestingly, constructing a builder_from_directory still seems to work even in the most recent tfds version. builder = tfds.builder_from_directory("gs://gresearch/robotics/fractal20220817_data/0.1.0")

kpertsch avatar Dec 21 '23 16:12 kpertsch

Thanks for your detailed bug report!

This is caused by that _GCS_BUCKET was made empty in this commit: https://github.com/tensorflow/datasets/commit/b78fc27c4f830c590c28002b1a1d07ef14e588dc

I'll contact the people who changed it, but with the holidays I don't know how quickly they'll respond.

In the meantime you can also load it by specifying the version:

ds = tfds.load("fractal20220817_data:0.1.0", data_dir="gs://gresearch/robotics")

tomvdw avatar Dec 22 '23 13:12 tomvdw

A fix was submitted. Could you test with tfds nightly if it now works?

tomvdw avatar Jan 12 '24 09:01 tomvdw

Thanks for your detailed bug report!

This is caused by that _GCS_BUCKET was made empty in this commit: b78fc27

I'll contact the people who changed it, but with the holidays I don't know how quickly they'll respond.

In the meantime you can also load it by specifying the version:

ds = tfds.load("fractal20220817_data:0.1.0", data_dir="gs://gresearch/robotics")

Hey, I am facing on the same issue, I tried the recommended line on Jupyter Notebook: ds = tfds.load("fractal20220817_data:0.1.0", data_dir="gs://gresearch/robotics") But still not work, and I got: UnimplementedError: File system scheme 'gs' not implemented (file: 'gs://gresearch/robotics/fractal20220817_data/0.1.0/features.json')

And the same line in Colab, it doesn't raise errors,

but I got a stupid question which is: How could I down the dataset (for example, fractal20220817_data) to my local PC, plz?

Thx a lot!

Ericodencoder avatar Jan 16 '24 09:01 Ericodencoder

@tomvdw or can we continue using the tfds.builder_from_directory workaround for loading datasets from the specified directory...?

Rahulraj0308 avatar Feb 01 '24 16:02 Rahulraj0308