Fetch items from a 'restricted' Zenodo record using an API token
Description of the desired feature:
Zenodo offers a feature to create “restricted” records. For such records:
- The list of files is not available through the website or API.
- It is possible to use what they call “link sharing” to generate a “secret link” that allows access to the record and its files.
- In practice, this link is the same as the record URL (for instance https://zenodo.org/records/12341234) with a parameter "?token=a1b2c3…" that seems to be base64 encoded and 240 characters long.
- The same token can be used with API queries to get a response that includes the file URLs.
AFAICT, it is not possible to use pooch.DOIDownloader / pooch.ZenodoRepository to access files stored in such records. This is the desired feature.
It seems the limitation occurs because here:
https://github.com/fatiando/pooch/blob/8b59c6e2ef87bf86b7fc9b794d9298ef309ac4c8/pooch/downloaders.py#L616-L623
…the data_repository variable is created without use of the self.kwargs that are later passed on to the HTTPDownloader:
https://github.com/fatiando/pooch/blob/8b59c6e2ef87bf86b7fc9b794d9298ef309ac4c8/pooch/downloaders.py#L626-L628
So while those kwargs can include params={"token": "a1b2c3…"} that will get passed to requests, the same kwargs are not used by ZenodoRepository.api_response, which thus gets an empty list of files, and data_repository.download_url() always errors.
Workaround
I currently use code roughly like this:
from pooch import Pooch, HTTPDownloader
TOKEN = "a1b2c3…"
p = Pooch(
base_url="https://zenodo.org/api/records/12341234/files",
registry={...},
)
def zenodo_token_download(url, output_file, pooch):
HTTPDownloader(params={"token": TOKEN})(f"{url}/content", output_file, pooch)
p.fetch(filename, downloader=zenodo_token_download)
This effectively constructs the API URL https://zenodo.org/api/records/12341234/files/{filename}/content?token=a1b2c3….
Although this works, it sacrifices the nice feature that the registry (filenames and hashes) is automatically populated from the Zenodo API, so this info must be duplicated in the calling code.
Are you willing to help implement and maintain this feature?
Sure!
@khaeru thanks for opening the issue! I didn't know about private files in Zenodo. This is not a major priority since I don't suspect it's the main use case for Pooch. The main issue will be testing this reliably and maintaining it over time for something that seems like a niche use case. If your workaround works well enough, then I'd rather keep this as a workaround instead of trying to add it to Pooch. The downloader mechanism was added to make this kind of workaround possible in the first place so I'm glad it's serving its purpose.
It would be good to have this in the docs, though. Either in the DOI tutorial or the DOIDownloader docstring. If this something you'd be willing to submit?
Please go ahead. I answered "Sure!" in the sense that time invested in a PR adding the feature would pay a benefit of reduced code size and maintenance burden on our side. If that benefit won't be there I can't really justify spending time on it. Sorry!