datasets Allow `list_datasets` to include private datasets

Allow `list_datasets` to include private datasets

Open ola13 opened this issue 2 years ago • 3 comments

I am working with a large collection of private datasets, it would be convenient for me to be able to list them.

I would envision extending the convention of using use_auth_token keyword argument to list_datasets function, then calling:

list_datasets(use_auth_token="my_token")

would return the list of all datasets I have permissions to view, including private ones. The only current alternative I see is to use the hub website to manually obtain the list of dataset names - this is in the context of BigScience where respective private spaces contain hundreds of datasets, so not very convenient to list manually.

Jul 26 '22 10:07 ola13

Thanks for opening this issue :)

If it can help, I think you can already use huggingface_hub to achieve this:

>>> from huggingface_hub import HfApi
>>> [ds_info.id for ds_info in HfApi().list_datasets(use_auth_token=token) if ds_info.private]
['bigscience/xxxx', 'bigscience-catalogue-data/xxxxxxx', ... ]

Though the latest versions of huggingface_hub that contain this feature are not available on python 3.6, so maybe we should first drop support for python 3.6 (see #4460) to update list_datasets in datasets as well (or we would have to copy/paste some huggingface_hub code)

Jul 26 '22 10:07 lhoestq

Great, thanks @lhoestq the workaround works! I think it would be intuitive to have the support directly in datasets but it makes sense to wait given that the workaround exists :)

Jul 26 '22 10:07 ola13

i also think that going forward we should replace more and more implementations inside datasets with the corresponding ones from huggingface_hub (same as we're doing in transformers)

Jul 26 '22 11:07 julien-c

datasets.list_datasets is now deprecated in favor of huggingface_hub.list_datasets (returns private datasets when token is present), so I'm closing this issue.

Jul 25 '23 15:07 mariosasko

datasets datasets copied to clipboard

Allow `list_datasets` to include private datasets

datasets
datasets copied to clipboard