datasets
datasets copied to clipboard
The "Writing custom datasets" example doesn't work in colab
To reproduce:
- Open https://github.com/tensorflow/datasets/issues.
- Copy "TL;DR" example, i.e.
cd path/to/my/project/datasets/
tfds new my_dataset # Create `my_dataset/my_dataset.py` template files
# [...] Manually modify `my_dataset/my_dataset.py` to implement your dataset.
cd my_dataset/
tfds build # Download and prepare the dataset to `~/tensorflow_datasets/`
- Open a new colab notebook (either CPU or GPU environment), then paste and run the example above.
- See error like:
Dataset generated at /content/my_dataset
You can start searching `TODO(my_dataset)` to complete the implementation.
Please check https://www.tensorflow.org/datasets/add_dataset for additional details.
Traceback (most recent call last):
File "/usr/local/bin/tfds", line 8, in <module>
sys.exit(launch_cli())
File "/usr/local/lib/python3.7/dist-packages/tensorflow_datasets/scripts/cli/main.py", line 61, in launch_cli
app.run(main, flags_parser=_parse_flags)
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/usr/local/lib/python3.7/dist-packages/tensorflow_datasets/scripts/cli/main.py", line 56, in main
args.subparser_fn(args)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_datasets/scripts/cli/build.py", line 37, in _build_datasets
raise NotImplementedError('tfds build not supported yet (#2447).')
NotImplementedError: tfds build not supported yet (#2447).
---------------------------------------------------------------------------
CalledProcessError Traceback (most recent call last)
[<ipython-input-1-689a0012549b>](https://localhost:8080/#) in <module>()
----> 1 get_ipython().run_cell_magic('shell', '', '\ntfds new my_dataset # Create `my_dataset/my_dataset.py` template files\n# [...] Manually modify `my_dataset/my_dataset.py` to implement your dataset.\ncd my_dataset/\ntfds build # Download and prepare the dataset to `~/tensorflow_datasets/`')
2 frames
[/usr/local/lib/python3.7/dist-packages/google/colab/_system_commands.py](https://localhost:8080/#) in check_returncode(self)
137 if self.returncode:
138 raise subprocess.CalledProcessError(
--> 139 returncode=self.returncode, cmd=self.args, output=self.output)
140
141 def _repr_pretty_(self, p, cycle): # pylint:disable=unused-argument
CalledProcessError: Command '
tfds new my_dataset # Create `my_dataset/my_dataset.py` template files
# [...] Manually modify `my_dataset/my_dataset.py` to implement your dataset.
cd my_dataset/
tfds build # Download and prepare the dataset to `~/tensorflow_datasets/`' returned non-zero exit status 1.
Hello @brainshawn , thank you for raising this issue.
Could you please check which tensorflow_datasets or tfds-nightly versions are you currently using?
In case you are not using the latest versions, could you please try pip uninstall tensorflow_datasets and retry?
Using the latest TFDS version (4.6.0), the following works for me in a colab:
!pip install tensorflow_datasets
!tfds --version
# The above should return `TensorFlow Datasets: 4.6.0`
!tfds build mnist
and (by copying-and-pasting a fake dataset into example_dataset.py):
!tfds new example_dataset
!cd example_dataset && tfds build example_dataset
Thanks for the investigation! As you said, the problem looks to be the TFDS version.
Mine is TensorFlow Datasets: 4.0.1 (instead of 4.6.0 above).
My colab uses the default kernel (note: not Colab Pro).