go-cloud blob/s3: High memory usage during multi-part uploads

Describe the bug

High memory usage when doing many parallel multi-part uploads.

To Reproduce

Spin up a bunch of goroutines doing muli-part uploads with a large part size.

Expected behavior

Low memory usage.

Version

Latest

Additional context

The issue seems to be caused by the fact that the S3 implementation creates a new multi-part uploader for each writer: https://github.com/google/go-cloud/blob/master/blob/s3blob/s3blob.go#L626

The multipart uploader maintains a pool of byte slices and as a result when there are a lot of parallel uploads you end up with a pool of parts sitting in memory for each upload until the G.C has time to clean them up.

I think this issue could be solved by caching uploaders based on the uploader options and then reusing them (since they're already thread-safe). I can make a P.R for this change if its acceptable.

Jun 01 '20 19:06 richardartoul

until the G.C has time to clean them up

Is this a real problem? The GC will clean them up when there's a need for them.

If you're doing a bunch of parallel uploads, and make the change to re-use the same Uploader, it seems like it will either:

Expand its pool of byte slices to be the superset of all the uploads; this will likely use the same amount of memory as currently.

OR

Have contention for the byte slices, slowing down uploads.

Maybe you could experiment with your proposed change to see if it makes a difference.

Jun 01 '20 20:06 vangent

I saw the PR. Can you answer the questions above? It's adding a fair amount of complexity and I want convincing that it is necessary.

Sep 30 '20 16:09 vangent

@vangent yeah thats why I left in draft mode. I'm working on testing its impact soon and I'll post results when I have them here (its a bit difficult for me to upgrade this dependency in our monorepo so it'll take me a bit of time)

Sep 30 '20 16:09 richardartoul

Actually, it might be possible to use the same uploader but override options per upload by virtue of the Upload methods taking func(*Uploader) that allows for it.

Nov 18 '20 10:11 segevfiner

I do think that the allocation volume is a problem, especially at scale. I don't think that reusing Uploaders is the solution, however. The buffer pool is sized for the number of concurrent uploads for that uploader, and I don't think that it will accept additional slices in its pool beyond its capacity.

Instead, I think that the solution is to hand Upload input that may conform to the readerAtSeeker interface, which the uploader checks for. If its input is a readerAtSeeker, UploadWithContext will not allocate any temporary buffers.

I've opened https://github.com/google/go-cloud/issues/3245 to track an API-level solution here.

May 05 '23 17:05 pgavlin

go-cloud go-cloud copied to clipboard

blob/s3: High memory usage during multi-part uploads

Describe the bug

To Reproduce

Expected behavior

Version

Additional context

go-cloud
go-cloud copied to clipboard