go-cloud icon indicating copy to clipboard operation
go-cloud copied to clipboard

blob/s3: High memory usage during multi-part uploads

Open richardartoul opened this issue 5 years ago • 4 comments

Describe the bug

High memory usage when doing many parallel multi-part uploads.

To Reproduce

Spin up a bunch of goroutines doing muli-part uploads with a large part size.

Expected behavior

Low memory usage.

Version

Latest

Additional context

The issue seems to be caused by the fact that the S3 implementation creates a new multi-part uploader for each writer: https://github.com/google/go-cloud/blob/master/blob/s3blob/s3blob.go#L626

The multipart uploader maintains a pool of byte slices and as a result when there are a lot of parallel uploads you end up with a pool of parts sitting in memory for each upload until the G.C has time to clean them up.

I think this issue could be solved by caching uploaders based on the uploader options and then reusing them (since they're already thread-safe). I can make a P.R for this change if its acceptable.

richardartoul avatar Jun 01 '20 19:06 richardartoul

until the G.C has time to clean them up

Is this a real problem? The GC will clean them up when there's a need for them.

If you're doing a bunch of parallel uploads, and make the change to re-use the same Uploader, it seems like it will either:

  • Expand its pool of byte slices to be the superset of all the uploads; this will likely use the same amount of memory as currently.

OR

  • Have contention for the byte slices, slowing down uploads.

Maybe you could experiment with your proposed change to see if it makes a difference.

vangent avatar Jun 01 '20 20:06 vangent

I saw the PR. Can you answer the questions above? It's adding a fair amount of complexity and I want convincing that it is necessary.

vangent avatar Sep 30 '20 16:09 vangent

@vangent yeah thats why I left in draft mode. I'm working on testing its impact soon and I'll post results when I have them here (its a bit difficult for me to upgrade this dependency in our monorepo so it'll take me a bit of time)

richardartoul avatar Sep 30 '20 16:09 richardartoul

Actually, it might be possible to use the same uploader but override options per upload by virtue of the Upload methods taking func(*Uploader) that allows for it.

segevfiner avatar Nov 18 '20 10:11 segevfiner

I do think that the allocation volume is a problem, especially at scale. I don't think that reusing Uploaders is the solution, however. The buffer pool is sized for the number of concurrent uploads for that uploader, and I don't think that it will accept additional slices in its pool beyond its capacity.

Instead, I think that the solution is to hand Upload input that may conform to the readerAtSeeker interface, which the uploader checks for. If its input is a readerAtSeeker, UploadWithContext will not allocate any temporary buffers.

I've opened https://github.com/google/go-cloud/issues/3245 to track an API-level solution here.

pgavlin avatar May 05 '23 17:05 pgavlin