osfr icon indicating copy to clipboard operation
osfr copied to clipboard

FR: Add feature that caches downloads

Open psanker opened this issue 3 years ago • 1 comments

It'd be great if there were a use_cache parameter in osfr::osf_download() to make downloads easier. I have several workflows built with drake that download data files from the OSF that rarely change. Because of the infrequent changes, I wrote a utility that caches the downloads based on the date_modified field.

Since I have something written up already, I'd be more than welcome to open a PR if you think this feature is worth including!

psanker avatar Jan 31 '21 18:01 psanker

Hey @psanker, I totally agree this would be a useful addition.

The OSF API actually provides checksums for remote files, so my plan is to add an argument that enables checksum comparisons to determine whether a file should be downloaded or not.

Dates could work too and would certainly be faster than calculating checksums locally but I'm concerned about false positives/negatives if the client/server system times aren't in sync.

I'll also note that osf_download()'s current implementation is a little janky and doesn't fully recurse the way osf_upload() does. This is only an issue when download targets include directories as the entire directory is always downloaded to a temp folder and then matching files are selectively copied over to the specified destination.

So there are a few issues but I'm open to discussing further.

aaronwolen avatar Feb 04 '21 15:02 aaronwolen