aws-sdk-java-v2
aws-sdk-java-v2 copied to clipboard
Support for zero-copy file upload/download with `java.nio`
Describe the feature
Currently it seems that the code specifically deals in on heap ByteBuffers either converting direct to non-direct Buffers or writing to non-direct Buffers when there are more performant and memory efficient zero-copy routes.
Most of the heavy lifting on supporting these off heap/zero copy file transfers are done by the HTTP client and the OS with native support in Java via FileChannel#transferTo
and FileChannel#transferFrom
to socket channels and similar for SocketChannel#write for DirectByteBuffers.
The specific implementation of this is up to the HTTP client in use (e.g. https://netty.io/4.0/api/io/netty/channel/FileRegion.html) but the option is removed at too high a level for the client specific implementations.
I believe it was done this way to guard against concurrent modification during write ops which I've discussed on https://github.com/aws/aws-sdk-java-v2/pull/3925
Use Case
High frequency efficient read/write ops to S3
Proposed Solution
Don't do any conversion from direct to non-direct ByteBuffers, pass along the File/FileChannel object instead of converting to a ByteBuffer publisher and leave it up to the http client lib to deal with as required.
For Path/File/Filechannel AsyncRequestBody
is unsuitable since it's an implementation of SdkPublisher<ByteBuffer>
, I think in this case the AsyncRequestBody
is just unnecessary and the plain Path/File/Filechannel should be passed on to the http client.
The Netty docs state
If your operating system (or JDK / JRE) does not support zero-copy file transfer, sending a file with FileRegion might fail or yield worse performance. For example, sending a large file doesn't work well in Windows.
In which case this should be possible to disable with a flag passed to the http client when building the S3Client
Other Information
No response
Acknowledgements
- [X] I may be able to implement this feature request
- [ ] This feature might incur a breaking change
AWS Java SDK version used
macOs 13.2.1 (22D68)
JDK version used
openjdk 11.0.16.1 & openjdk 19.0.1
Operating System and version
macOs 13.2.1 (22D68)
Thank you @StephenFlavin, this makes sense. We'll review your related PR.
The related PR would be a precursor to supporting these for put/post with off heap ByteBuffers. As far as I can tell
if (!buffer.hasArray()) {
buffer = ByteBuffer.wrap(BinaryUtils.copyBytesFrom(buffer));
}
could be removed and it would just work (tested with the netty client) though I didn't want to do it as part of this PR and I haven't had a chance to delve too deep into the netty implementation to make sure it's doing zero copy, I did see a significant performance bump vs inputstream.
The other piece there will be making use of Filechannel.transferTo
for put/post but the changes required for that would be more significant since everything currently deals in ByteBuffers.
I've not yet looked at the changes required to the get requests.
with the pre-cursor PR merged, I've raised the first PR to enable zero-copy transfers via direct byte buffers with AsyncRequestBody
comments/thoughts appreciated.
I've kept this change small as a minimal effort to support zero-copy in the async S3 client.
@StephenFlavin / @debora-ito arrived here looking for this feature in the sdk; if you need a user to build and test let me know (i may anyway)
Hi @sgammon, I've not had any time to work on this as I'm no longer doing such performance sensitive work with S3 though I would like to see the work continued, I may loop back to this in future but feel free to take on a chunk if you fancy it.
- From an upload point of view there's passing FileChannels down the the http client. (would love to see this compared to mapped buffers)
- I hadn't really looked at the download code yet.
- Once these changes were in place I was going to see if IOuring could be used to further improve things (https://github.com/netty/netty-incubator-transport-io_uring)