s3 icon indicating copy to clipboard operation
s3 copied to clipboard

Efficiently writing s3 object to file

Open jacquescrocker opened this issue 14 years ago • 6 comments

s3 really needs a way to stream content to a file somehow. Loading .content on a large file pretty much puts everything in memory and destroys any heroku worker that it touches. (they cap at 300mb memory limit)

Something like this would be awesome:

s3obj = s3_bucket.objects.find("my_huge_object.mov")
s3obj.write_to_file("/tmp/my_huge_object.mov")

especially if it could avoid loading the entire content into memory at once

jacquescrocker avatar Nov 25 '10 18:11 jacquescrocker

This old s3 wrapper has a S3Object.stream http://amazon.rubyforge.org/

maybe we could take some of the code from there?

jacquescrocker avatar Nov 25 '10 20:11 jacquescrocker

Yeah, there should be such possibility when downloading objects indeed. I've recently added upload streaming, I'll try to take a look at downloads as well. If you have some idea of solving it, you can write appropriate patch ;-).

Thanks for suggestion.

qoobaa avatar Nov 25 '10 20:11 qoobaa

Sounds good. I'm not too good at this type of stuff, but I can try to hack together some code taken from marcel's library. Might make more sense to let you handle it though ;-)

jacquescrocker avatar Nov 25 '10 20:11 jacquescrocker

Ok, after digging around a bit, it looks like its pretty easy. You just have to call read_body on the HTTPResponse and give it a block.

so the change would be in parse_headers. It should avoid explicitly calling response.body as this belongs in the content accessor.

then we can add a stream_content accessor that takes block which is then passed along to HTTPResponse

piece of cake, i'll submit a patch

jacquescrocker avatar Nov 25 '10 21:11 jacquescrocker

Sounds great! :-)

qoobaa avatar Nov 25 '10 21:11 qoobaa

Up ! :)

abrisse avatar Oct 07 '11 16:10 abrisse