datasets icon indicating copy to clipboard operation
datasets copied to clipboard

tfds.load() does not load datasets with a capital letter

Open BoguesUser opened this issue 1 year ago • 2 comments

Short description

Running tfds build Mk0_datasets_builder.py will save to ~/tensorflow_datasets/Mk0 When running tfds.load('Mk0', split='train', shuffle_files=True) to import it, the following error is given.

No registered data_dirs were found in:
        - /home/user/tensorflow_datasets

Renaming the file to mk0 from Mk0 will allow it to load however.

Environment information

  • Operating System: Arch Linux

  • Python version: 3.11.5

  • `tensorflow-datasets version: 4.9.4

  • tensorflow version: 2.14.0

  • Does the issue still exists with the last tfds-nightly package (pip install --upgrade tfds-nightly) ?

  • Yes

Reproduction instructions Build a dataset with a capital letter in the name then attempt to load with tfds

tfds.load('Mk0', split='train', shuffle_files=True)

Expected behavior Either tfds build should automatically make the name lowercase or tfds.load() should be able to deal with uppercase letters

BoguesUser avatar Feb 05 '24 14:02 BoguesUser

Thanks for reporting this issue!

This is indeed a real problem, we'll need to think if supporting uppercase in tfds.load is possible. In the meanwhile you should stick with lowercase for your dataset names. Sorry for the inconvenience.

fineguy avatar Feb 06 '24 10:02 fineguy

Awesome. Thank you for looking into this.

BoguesUser avatar Feb 07 '24 05:02 BoguesUser