coralnet icon indicating copy to clipboard operation
coralnet copied to clipboard

Image downloads: Using Dropbox to manage very large downloads

Open StephenChan opened this issue 6 years ago • 0 comments

The save_url endpoint of the Dropbox API is probably what we're after. This lets the Dropbox API download from arbitrary URLs to a Dropbox folder. https://blogs.dropbox.com/developers/2015/06/programmatically-saving-a-url-to-dropbox/ https://dropbox.github.io/dropbox-api-v2-explorer/#files_save_url

This involves getting access to the user's Dropbox account via OAuth2, so we can make the API calls.

We have to specify a path in the user's Dropbox to which the image will be saved.

  • Assuming an image name of IMG_2958.jpg, the save path could just be IMG_2958.jpg or perhaps coralnet_downloads/IMG_2958.jpg.
  • Assuming an image name of LTER1/IMG_2958.jpg, the save path could be coralnet_downloads/LTER1/IMG_2958.jpg. This means that uploading an entire folder tree (#119) and then downloading it should get the same folder tree back.
  • If a filepath already exists, don't overwrite it, and let the user know. There may be other download errors such as the Dropbox account not having enough space, so be prepared to catch and report errors in general.

NOTE: save_url only takes one URL, and as far as I can tell there is no way to queue downloads via the API. So, quite unfortunately, the user must leave their browser open on this page while the downloads march on.

The API calls would go something like this:

  • save_url image 1
  • save_url_job image 1 repeatedly (every 0.5s maybe?), until the job is complete
  • save_url image 2
  • ...

To track which images have yet to be downloaded, perhaps the initial Ajax call starting the batch download could save a session variable containing the IDs of images pending download.

Worth noting that the django-storages app has Dropbox API support, so maybe that could be useful?

StephenChan avatar Jun 03 '18 20:06 StephenChan