aws-sdk-go-v2
aws-sdk-go-v2 copied to clipboard
Feature: manager.DownloadStream a download manager that works with io.Writer
The use case is laid out in the playground for one of them. https://play.golang.org/p/C2oj9Dyen4j
The propose of this PR is to make a method that is to supplement the existing Download workflow that takes io.WriterAt with one that takes io.WriteAt
This one is still parallel and can do multiple parts in any order. They are then written back to the writer in order where they are never stored in memory beyond the go routines that are actually getting the chunks.
This is enforced by sliding window that has been implemented in the files in feature/s3/manager/internal/window/
Tests for the DownloadStream have been duplicated from Download with small modifications and comments for how this varies cause of sliding window.
Naming can be changed or documentation around DownloadStream was trying to get a name to distinguish them and note why they are named different since they have different use cases for files or in memory operations where you need the bytes in order but not the entire file.
Thanks for taking the time to create this PR proposal @BrandonRoehl. We'll review this and post feedback.
In the interim, did you consider using manager.WriteAtBuffer as the input value to the Download method?
It looks like the documentation for the download manager is still is referring to the old v1 location of the aws.WriteAtBuffer This utility has moved to the manager package.
Yeah we were having issues with manager.WriteAtBuffer where with large files, 20GiB and larger, when some parts took long the buffer grew quite large quickly as the 5MiB chunk was 20 or more ahead. This way with go back N we are limiting that it doesn't just grow for ever and the write at will not fail
Wanted to ask a question about the workflows? It says I need a maintainer to run them and cannot see why they are red even after running go test ./... locally everything seems fine
Very interested in seeing this functionality; at this time it's very non-obvious how to download a file concurrently while streaming the data as it comes in so as to keep the in-memory utilization low.
@BrandonRoehl
Thanks for the PR and apologies for the delayed response.
I think this overlaps with https://github.com/aws/aws-sdk-go-v2/pull/1742. I think we want to land one of these but not both. Right now I'm leaning towards favoring the other PR implementation. If it meets your needs then we can work towards landing it (happy to have you review and provide feedback on that PR if that's the direction we go).
Could you take a look and first verify if the proposed changes in the other PR there would satisfy your use case?
Hi there,
Since this did not get any responses since Feb, and we got new internal guidelines for a new implementation of the Downloader, I will be closing this PR.
Thanks again for your contribution. Ran