aws-sdk-ruby
aws-sdk-ruby copied to clipboard
S3 put_object should accept a block to facillitate chunked writes
Describe the feature
After using get_object's chunked read, I assumed put_object similarly supported chunked writing:
client.put_object(bucket: blob.bucket, key: blob.key) do |buffer|
while chunk = blob.read(16 * 256)
buffer << chunk
end
end
For reference, get_object supports this:
client.get_object(bucket: blob.bucket, key: blob.key) do |chunk|
buffer << chunk
end
But this isn't currently supported and results in an empty object, since the block is ignored.
Use Case
I want to write an IO to S3 while maintaining a low memory footprint, while being explicit with how much I read for each chunk. I do not want to rely on S3 internals to choose how large my chunks should be.
Proposed Solution
Similarly to get_object, allow put_object to accept a block, yielding the internal request body.
Other Information
No response
Acknowledgements
- [X] I may be able to implement this feature request
- [ ] This feature might incur a breaking change
SDK version used
1.113.0
Environment details (OS name and version, etc.)
Linux 5.15.153.1-microsoft-standard-WSL2 #1 SMP Fri Mar 29 23:14:13 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Hi, have you looked at https://github.com/aws/aws-sdk-ruby/blob/version-3/gems/aws-sdk-s3/lib/aws-sdk-s3/customizations/object.rb#L385
This issue has not received a response in 1 week. If you still think there is a problem, please leave a comment to avoid the issue from automatically closing.
Hi, have you looked at https://github.com/aws/aws-sdk-ruby/blob/version-3/gems/aws-sdk-s3/lib/aws-sdk-s3/customizations/object.rb#L385
Thanks for this. I wasn't aware of that method. I'm curious if we could make put_object with a block delegate to upload_stream?
I'm not sure if that's possible without a breaking change within the major version. The block is already reserved to be a response target here: https://github.com/aws/aws-sdk-ruby/blob/version-3/gems/aws-sdk-core/lib/seahorse/client/request.rb#L70. There would be no way to differentiate that a block is for reading or writing and would be inconsistent.
I believe you can also pass an IO as the body for put_object and it will be read. I'll leave this as an open feature request but I think the interface would have to be different.
I'm not sure if that's possible without a breaking change within the major version. The block is already reserved to be a response target here: https://github.com/aws/aws-sdk-ruby/blob/version-3/gems/aws-sdk-core/lib/seahorse/client/request.rb#L70. There would be no way to differentiate that a block is for reading or writing and would be inconsistent.
The put_object method does not currently take a block or pass it along to send_request, so I don't think introducing a block that is used for streaming writes would be a breaking change. I do understand that the internals of put_object would need to be refactored, but I don't see any apparent breaking changes for the public put_object API.
I am currently passing an IO body that streams the data as required (well as much as I can from the outside), just thought the block interface would be a nicer and clearer DX, since it'd align well with assumptions from using get_object.
This could be done by checking streaming input modeling on the operation. However this could be an inconsistent API, where some operations have block streaming requests and others for responses. Additionally, writing from the block would be very complex - net http body writing would have to yield to the block and I believe that would be inefficient. Our current build request would need to differentiate block types. Currently the IO body is passed to net http's body stream and uses IO.copy_stream (written in C) and the stream is read in chunks already. I can leave this open as a feature request to consider.