s3fs icon indicating copy to clipboard operation
s3fs copied to clipboard

Use multipart upload instead of a temporary file

Open gilbsgilbs opened this issue 6 years ago • 4 comments

Hi,

If I understand the current implementation correctly, it creates a temporary file on disk, writes everything to it, and just when the file is closed, uploads it to S3. This implementation is not applicable for very large files and is probably slightly suboptimal in some circumstances. For these reasons, I think it would be beneficial should s3fs could take advantage of multipart uploads.

There's a lot of gotchas and boilerplate associated with multipart uploads though. One possibly simpler implementation could be to delegate the hassle to a third-party library that already does the proxying stuff, such as smart_open (great library by the way). Needless to say that all this could be opt-in.

Thank you so much for all the great work on pyfilesystem2 and s3fs :+1:

gilbsgilbs avatar Dec 20 '18 10:12 gilbsgilbs

It is a good idea. I'll keep this issue up as a reminder, but I'm not sure when I'll have time to look at it just yet.

willmcgugan avatar Dec 20 '18 11:12 willmcgugan

Thanks for your prompt response @willmcgugan, I really appreciate. If somebody or myself want to pick this and submit a PR, would you:

  • accept that it relies on third party libraries?
  • prefer it to be opt-in only? / accept that it completely replaces the tempfile implementation?

gilbsgilbs avatar Dec 20 '18 12:12 gilbsgilbs

I would be delighted to accept a PR...

accept that it relies on third party libraries?

Third party libs would be okay, as long as they are of a good quality and don't have too many dependancies themselves. But I'd prefer to avoid C extensions if possible.

prefer it to be opt-in only? / accept that it completely replaces the tempfile implementation?

It's rare to have one method being better than another in every way, so I'd need to consider the pros and cons. For instance, if you write a line at a time, that will be quite efficient with the current tmpfile, but I imagine it would be very slow to do a request for every line. But if there is some buffering on top of that, it might not be an issue.

If you need it for something commercial, I am available for contract work.

willmcgugan avatar Dec 20 '18 12:12 willmcgugan

If you need it for something commercial, I am available for contract work.

Thanks for proposing. Actually, I would have needed it for commercial stuff, but it's too late already. I did things otherwise (and it was not huge enough to spend more time on it anyways). I may contribute in my spare time for fun though, yet not sure if and not sure when, just like you :wink: . Cheers.

gilbsgilbs avatar Dec 20 '18 14:12 gilbsgilbs