dataverse icon indicating copy to clipboard operation
dataverse copied to clipboard

Large file download support

Open philippconzett opened this issue 4 years ago • 5 comments

I just noticed the following desired characteristics in the COAR Community Framework for Good Practices in Repositories (https://doi.org/10.5281/zenodo.4110829); cf.:

2.5 The repository provides a mechanism to make very large files available to users outside of the normal user-interface (in cases where the size of the file becomes unwieldy for the user).

There are already a number of open issues related to upload support for large files, but I couldn't find any issue dealing with download support for large files.

philippconzett avatar Jan 22 '21 06:01 philippconzett

This is S3-specific, but from the guides ( https://guides.dataverse.org/en/5.3/installation/config.html#second-configure-dataverse-to-use-s3-storage ):

"Optionally, you can have users download files from S3 directly rather than having files pass from S3 through Payara to your users. To accomplish this, set dataverse.files..download-redirect to true like this"

We use this feature in Harvard Dataverse and it works well via both GUI and API. The idea is that the file is streamed directly from S3 to the user's computer.

pdurbin avatar Jan 22 '21 16:01 pdurbin

Sounds good! How does the GUI alternative work from the end user side: Is there a button called "Download from S3 directly" or similar?

philippconzett avatar Jan 23 '21 06:01 philippconzett

Once an admin has selected direct downloads for a given store, for a download from the UI, the browser just follows a redirect request and the file downloads from S3. The user sees no difference (except hopefully speed). (Note that direct uploads and downloads use pre-signed URLs - the enable the users browser to up/download to S3 without having access to the overall bucket or making the file URLs public.)

qqmyers avatar Jan 23 '21 15:01 qqmyers

Note: we do have some logic for rsync uploaded package files that instead of following the redirect, we show the link and ask the user to user their favorite web downloader (that could, for example, allow for pausing and resuming). The user could, of course, still use their browser with the link.

We've discussed wanting to use this popup more generally for large files.

scolapasta avatar Jan 25 '21 15:01 scolapasta

@philippconzett hi! I'm looking at the Globus pull request (#8891) which also promises to support large file download. Globus works outside the normal user interface (from characteristic you quoted).

What would you consider the "definition of done" for this issue? Are two options, S3 and Globus, enough? Or did you have something else in mind? Thanks!

pdurbin avatar Aug 23 '22 13:08 pdurbin

Hi Phil! I don't really know what exactly COAR means by "outside of the normal user-interface", but I think options like S3 and Globus (or even just API calls through the command line) should qualify for this. So, as for me, we can close this issue once large file download via S3 and Globus works. Thanks!

philippconzett avatar Aug 25 '22 16:08 philippconzett

@philippconzett sounds good to me! I just marked this issue to be automatically closed when that pull request I mentioned is merged:

  • #8891

At some point in the future we should move the "Big Data Support" content from the Dev Guide to the Admin Guide (or maybe the Installation Guide) to indicate that it's more official, less of a dev thing to play with. But we can create a separate issue for that some day. 😄

pdurbin avatar Aug 25 '22 16:08 pdurbin

Ok, I just merged #8891 (Globus support) so this issue is closed. Please open follow up issues, as needed. Thanks.

pdurbin avatar Sep 19 '22 17:09 pdurbin