datasets icon indicating copy to clipboard operation
datasets copied to clipboard

Add `tfds list` CLI command

Open vijayphoenix opened this issue 3 years ago • 5 comments

Addition of community dataset will sharply increase the no of available datasets. So it will become difficult to browse through repos or catalog.

So, it would great if we can add a new CLI command to list all the datasets in present in TFDS. This will allows users to easily search a dataset using tools like grep.

Possible Usage

tfds list # List all datasets
tfds list --type audio # List all audio dataset in TFDS
tfds list --namespace huggingface # List all huggingface community dataset
tfds list --search sun # Lists datasets like sun397, lsun (alternative is to use grep)

Implementation details

• Use/modify the tfds.list_builder API. • Add a new file list.py in scripts/cli folder

vijayphoenix avatar Mar 18 '21 16:03 vijayphoenix

Hi, I have started work on this

jatin-code777 avatar Mar 18 '21 17:03 jatin-code777

Since @jatin-code777 has already started working on --type and --exclude_community part, I will be sending a PR for the --search and --namespace flags soon.

NikhilBartwal avatar Mar 18 '21 19:03 NikhilBartwal

I am working on this issue. Will you please assign it to me?

Srikeshram avatar Mar 26 '21 04:03 Srikeshram

Hey @Srikeshram, @jatin-code777 is already working on the issue and taking it up would only lead to work duplication. You can take up other bugs and issues which are still unresolved in TFDS. Thank you for the efforts!

NikhilBartwal avatar Mar 26 '21 05:03 NikhilBartwal

Leaving here my cheers on this feature request!

It'd be a game-changer to be able to dynamically do "dataset version control" easier.

pnmartinez avatar Mar 24 '22 12:03 pnmartinez