osfr
osfr copied to clipboard
FR: Add feature that caches downloads
It'd be great if there were a use_cache
parameter in osfr::osf_download()
to make downloads easier. I have several workflows built with drake that download data files from the OSF that rarely change. Because of the infrequent changes, I wrote a utility that caches the downloads based on the date_modified
field.
Since I have something written up already, I'd be more than welcome to open a PR if you think this feature is worth including!
Hey @psanker, I totally agree this would be a useful addition.
The OSF API actually provides checksums for remote files, so my plan is to add an argument that enables checksum comparisons to determine whether a file should be downloaded or not.
Dates could work too and would certainly be faster than calculating checksums locally but I'm concerned about false positives/negatives if the client/server system times aren't in sync.
I'll also note that osf_download()
's current implementation is a little janky and doesn't fully recurse the way osf_upload()
does. This is only an issue when download targets include directories as the entire directory is always downloaded to a temp folder and then matching files are selectively copied over to the specified destination.
So there are a few issues but I'm open to discussing further.