datasets icon indicating copy to clipboard operation
datasets copied to clipboard

The "Writing custom datasets" example doesn't work in colab

Open brainshawn opened this issue 3 years ago • 2 comments

To reproduce:

  1. Open https://github.com/tensorflow/datasets/issues.
  2. Copy "TL;DR" example, i.e.
cd path/to/my/project/datasets/
tfds new my_dataset  # Create `my_dataset/my_dataset.py` template files
# [...] Manually modify `my_dataset/my_dataset.py` to implement your dataset.
cd my_dataset/
tfds build  # Download and prepare the dataset to `~/tensorflow_datasets/`
  1. Open a new colab notebook (either CPU or GPU environment), then paste and run the example above.
  2. See error like:
Dataset generated at /content/my_dataset
You can start searching `TODO(my_dataset)` to complete the implementation.
Please check https://www.tensorflow.org/datasets/add_dataset for additional details.
Traceback (most recent call last):
  File "/usr/local/bin/tfds", line 8, in <module>
    sys.exit(launch_cli())
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_datasets/scripts/cli/main.py", line 61, in launch_cli
    app.run(main, flags_parser=_parse_flags)
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_datasets/scripts/cli/main.py", line 56, in main
    args.subparser_fn(args)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_datasets/scripts/cli/build.py", line 37, in _build_datasets
    raise NotImplementedError('tfds build not supported yet (#2447).')
NotImplementedError: tfds build not supported yet (#2447).
---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
[<ipython-input-1-689a0012549b>](https://localhost:8080/#) in <module>()
----> 1 get_ipython().run_cell_magic('shell', '', '\ntfds new my_dataset  # Create `my_dataset/my_dataset.py` template files\n# [...] Manually modify `my_dataset/my_dataset.py` to implement your dataset.\ncd my_dataset/\ntfds build  # Download and prepare the dataset to `~/tensorflow_datasets/`')

2 frames
[/usr/local/lib/python3.7/dist-packages/google/colab/_system_commands.py](https://localhost:8080/#) in check_returncode(self)
    137     if self.returncode:
    138       raise subprocess.CalledProcessError(
--> 139           returncode=self.returncode, cmd=self.args, output=self.output)
    140 
    141   def _repr_pretty_(self, p, cycle):  # pylint:disable=unused-argument

CalledProcessError: Command '
tfds new my_dataset  # Create `my_dataset/my_dataset.py` template files
# [...] Manually modify `my_dataset/my_dataset.py` to implement your dataset.
cd my_dataset/
tfds build  # Download and prepare the dataset to `~/tensorflow_datasets/`' returned non-zero exit status 1.

brainshawn avatar Jun 21 '22 12:06 brainshawn

Hello @brainshawn , thank you for raising this issue.

Could you please check which tensorflow_datasets or tfds-nightly versions are you currently using?

In case you are not using the latest versions, could you please try pip uninstall tensorflow_datasets and retry?

Using the latest TFDS version (4.6.0), the following works for me in a colab:

!pip install tensorflow_datasets
!tfds --version
# The above should return `TensorFlow Datasets: 4.6.0`
!tfds build mnist

and (by copying-and-pasting a fake dataset into example_dataset.py):

!tfds new example_dataset
!cd example_dataset && tfds build example_dataset

ccl-core avatar Jun 26 '22 17:06 ccl-core

Thanks for the investigation! As you said, the problem looks to be the TFDS version.

Mine is TensorFlow Datasets: 4.0.1 (instead of 4.6.0 above).

My colab uses the default kernel (note: not Colab Pro).

brainshawn avatar Jun 28 '22 15:06 brainshawn