anonlink-entity-service icon indicating copy to clipboard operation
anonlink-entity-service copied to clipboard

resumable CLK file upload

Open wilko77 opened this issue 6 years ago • 3 comments

Users experience problems with the current file upload if the internet connection is not that great. Eventually there will be a timeout and all the progress is for naught.

Related problem: The clk uploads a collected in total in memory before being written to minio storage. This will eventually lead to memory shortages.

Proposal 1:

As we need something quick-ish... Why not use the features of minio directly. They provide the server functionality and a client library to perform data upload which is resumable. Basically, something along those lines: https://docs.minio.io/docs/upload-files-from-browser-using-pre-signed-urls Calling the POST on the clk endpoint returns the pre-signed url for the bucket to upload to. In clkhash, use the minio python client to perform the upload.

Towards proposal 2:

Google drive has a nice rest api for file uploads which also allows resume. https://developers.google.com/drive/api/v3/resumable-upload I don't know if they open source any of the components involved, though. Maybe someone else implemented something somewhere?

wilko77 avatar Jun 05 '18 00:06 wilko77

Proposal 1 is #20 - I agree it would be the easier approach.

hardbyte avatar Jun 05 '18 01:06 hardbyte

Okay, although minio does provide a client for resumable uploads, it unfortunately does not work with the pre-signed upload urls. :( For this to work, the user would have to instantiate a full minio client with credentials and everything.

Another problem is that we do not want to expose the minio root. Thus, minio is address differently from within the cluster than from outside. However, minio will sign the download url with the inside host and port, with no option to provide the outside address.

This seems too much of a security nightmare... We should probably look at something else...

wilko77 avatar Jun 06 '18 04:06 wilko77

For proposal 2: The google client api python library is available under Apache license here: https://github.com/google/google-api-python-client/blob/master/googleapiclient/http.py. So the client code for this proposal could then easily be adopted from the google lib.

wilko77 avatar Jun 06 '18 04:06 wilko77