datasets
datasets copied to clipboard
Allow `list_datasets` to include private datasets
I am working with a large collection of private datasets, it would be convenient for me to be able to list them.
I would envision extending the convention of using use_auth_token
keyword argument to list_datasets
function, then calling:
list_datasets(use_auth_token="my_token")
would return the list of all datasets I have permissions to view, including private ones. The only current alternative I see is to use the hub website to manually obtain the list of dataset names - this is in the context of BigScience where respective private spaces contain hundreds of datasets, so not very convenient to list manually.
Thanks for opening this issue :)
If it can help, I think you can already use huggingface_hub
to achieve this:
>>> from huggingface_hub import HfApi
>>> [ds_info.id for ds_info in HfApi().list_datasets(use_auth_token=token) if ds_info.private]
['bigscience/xxxx', 'bigscience-catalogue-data/xxxxxxx', ... ]
Though the latest versions of huggingface_hub
that contain this feature are not available on python 3.6, so maybe we should first drop support for python 3.6 (see #4460) to update list_datasets
in datasets
as well (or we would have to copy/paste some huggingface_hub
code)
Great, thanks @lhoestq the workaround works! I think it would be intuitive to have the support directly in datasets
but it makes sense to wait given that the workaround exists :)
i also think that going forward we should replace more and more implementations inside datasets with the corresponding ones from huggingface_hub
(same as we're doing in transformers
)
datasets.list_datasets
is now deprecated in favor of huggingface_hub.list_datasets
(returns private datasets when token
is present), so I'm closing this issue.