aws-sdk-java-v2
aws-sdk-java-v2 copied to clipboard
Add `InputStreamSubscriber.transferTo(OutputStreamPublisher)` optimization
Describe the feature
TL;DR: Making InputStreamSubscriber
implement a specialized InputStream#transferTo(OutputStream out)
method that detects whether out
is an OutputStreamPublisher
and directly passes the ByteBuffer
s to it if so (instead of copying them).
Currently, InputStreamSubscriber
inherits the transferTo(OutputStream out)
method from java.io.InputStream
, which uses an internal buffer to copy data. It works with usual synchronous output streams, such as System.out
or ByteArrayOutputStream
, but not with asynchronous ones such as OutputStreamPublisher
, which is used by BlockingOutputStreamAsyncRequestBody
. The reason is that OutputStreamPublisher#write(byte[])
does not copy the buffer, but just passes it to the subscriber via ByteBuffer#wrap(byte[])
. Therefore, the buffer may be modified by subsequent InputStream#read(byte[] buffer)
before the subscriber reads it, which leads to unexpected behavior.
To avoid this issue, the simplest solution is to do something like out.write(in.readAllBytes())
. This is memory-consuming for a long input stream.
Use Case
I want to open an S3 object as an InputStream in
, read a header part of it, modify something, and then write the modified header plus the remaining unchanged content to a new S3 object using the putObject(BlockingOutputStreamAsyncRequestBody)
interface. It is straightforward to use in.transferTo(out)
to copy the unchanged content. But currently, I have to use out.write(in.readAllBytes())
instead for correctness.
That is to say, one has to be very careful when using BlockingOutputStreamAsyncRequestBody
. I encountered some cases that the data written to S3 was malformed due to this issue. Unfortunately, such behavior is not well-documented.
Implementing this feature will make such use cases safer and more performant.
Proposed Solution
No response
Other Information
No response
Acknowledgements
- [ ] I may be able to implement this feature request
- [ ] This feature might incur a breaking change
AWS Java SDK version used
2.20.19
JDK version used
OpenJDK 64-Bit Server VM Corretto-17.0.3.6.1
Operating System and version
Darwin Kernel Version 22.5.0
@fanyang01 thank you for the report. We are considering this a bug.
@fanyang01 I recently ran into the corruption issue you mentioned while trying to use BlockingOutputStreamAsyncRequestBody (which I'm now calling BOSARB). I have a bit of code that reads data from a JDBC ResultSet, and writes it directly to an OutputStream in various different formats (csv, json, etc). Trying to do that with BOSARB was an exercise in frustration. I ultimately pivoted to using BISARB and a PipedInputStream/PipedOutputStream pair with a separate thread writing to the output stream.