aws-sdk-go-v2 icon indicating copy to clipboard operation
aws-sdk-go-v2 copied to clipboard

Draft: Parallel chunked downloader for regular `io.Writer`

Open pete-woods opened this issue 3 years ago • 2 comments

What this does:

  • Adds a modified manager.downloader type that uses an ordered ring of buffers (similar to manager.uploader) to parallelize downloads.
  • It can write to a plain io.Writer, removing the limitation of having to provide an io.WriterAt, which means you can stream the output somewhere (e.g. into a tar decompressor), without writing to a temporary file.
  • We've found this can give a ~33% throughput improvement for EC2 instances in the same region as the S3 bucket.

What's not done:

  • Giving things proper names.
  • There's duplication of code in both the downloader2 type and Download2 function.
  • More of the tests need adapting to the Download2 function.

See related discussion: https://github.com/aws/aws-sdk-go-v2/discussions/1735

pete-woods avatar Jun 28 '22 14:06 pete-woods

@jasdel would you have any time to look over this draft? I've tried to explain its purpose in the description.

It obviously needs some further work (at very least naming), but I would really like to get something like this into the SDK, rather than having to carry a fork.

I found an ~20% speed improvement using this PR, versus the standard single stream.

pete-woods avatar Aug 25 '22 13:08 pete-woods

Thanks taking the time to create this draft and ping @pete-woods. We'll review your draft and post feedback and suggestions on the proposed change.

jasdel avatar Aug 25 '22 16:08 jasdel

@jasdel anything I can do to help you folks with reviewing this?

pete-woods avatar Oct 17 '22 08:10 pete-woods

@pete-woods

Thanks for the PR and apologies for the delayed response.

I've taken a cursory look at this and overall this would be a nice addition. As you mentioned this is still a draft PR. What is left to do other than figuring out naming and maybe additional tests? I think to get this reviewed we would want to get it out of draft status first.

From my cursory review it could probably use some refactoring to remove duplication as well as some additional comments on the ring logic. It's not obviously correct at a glance, I think additional comments might help clear this up (and/or additional tests).

aajtodd avatar Feb 01 '23 16:02 aajtodd

Hi @pete-woods ,

Thanks for your contribution. We recently got new cross-SDK guidelines to re-implement Downloader. We will likely use some pieces of code from your PR, but as it stands we cannot accept this PR as-is.

Because this is not actionable at the moment we are going to close this. Thanks again! Ran~

RanVaknin avatar Aug 22 '23 16:08 RanVaknin

No worries. I hope some of my code is useful during your work

pete-woods avatar Aug 22 '23 16:08 pete-woods