PIDGINv3 icon indicating copy to clipboard operation
PIDGINv3 copied to clipboard

[reproducibility] Automate downloading of resources

Open cthoyt opened this issue 4 years ago • 0 comments
trafficstars

There's a step in reproduction that requires users to manually download data and place it in a location relative to the code itself. These manual steps aren't so good for reproducibility, so after packaging the code as suggested in #9, it would be good to automate the download of these resources. pystow is a tool for just this (disclaimer: I did write this tool, but it was exactly for enabling this kind of thing in a simple and approachable way that's good for scientists).

Example:

import pystow

url = 'https://drive.google.com/file/d/1D-iHmdRncTOImh68B54mEHkUvo5CHJVk/view'
module_name = 'pidginv3'
path = pystow.ensure(module_name, url=url, name='no_ortho.zip')
# path is now ~/.data/pidginv3/no_ortho.zip

Now you can use the path to open the file, etc. but you don't have to think about how the user gets it or where it's stored. This can be used anywhere in the code and will eagerly download the file on first download. However, I'm not sure how well this works with google, and have therefore suggested moving the resources to Zenodo/Figshare/equivalent in #11.

cthoyt avatar Feb 08 '21 13:02 cthoyt