aws-sdk-kotlin
aws-sdk-kotlin copied to clipboard
InputStream adapter for ByteStream
Describe the feature
A method on ByteStream to convert it to an InputStream, or a utility function to do so.
Is your Feature Request related to a problem?
Sometimes you need to connect S3 calls to code that accept a blocking java.io.InputStream. It would be helpful if the SDK included some way to convert a ByteStream to an InputStream. This avoids a lot of in-memory buffering.
Proposed Solution
An input adapter similar to
https://github.com/ktorio/ktor/blob/c68d889b7088fca0e9a75f299b58ce7f55572a56/ktor-io/jvm/src/io/ktor/utils/io/jvm/javaio/Blocking.kt#L28
or, an adapter that could convert SdkByteReadChannel back to a ktor ByteReadChannel, then I could use ByteReadChannel.toInputStream
Describe alternative solutions or features you've considered
As a workaround I could try something like this (untested and lacks proper error handling). This is however less efficient since it involves copying data unnecessarily.
@OptIn(InternalApi::class)
suspend fun ByteStream.toInputStream(): InputStream {
val ktorChan = ByteChannel()
val chan = when (this) {
is ByteStream.OneShotStream -> readFrom()
is ByteStream.Buffer -> SdkByteReadChannel(this.bytes())
is ByteStream.ReplayableStream -> newReader()
else -> error("unexpected ByteStream body")
}
GlobalScope.async {
while (!chan.isClosedForRead) {
chan.awaitContent()
val chunk = ByteArray(chan.availableForRead)
chan.readAvailable(chunk)
ktorChan.writeFully(chunk)
}
ktorChan.close()
}
return ktorChan.toInputStream()
}
Acknowledge
- [X] I may be able to implement this feature request
AWS Kotlin SDK version used
0.16.0
Platform (JVM/JS/Native)
jvm
Operating System and version
macOS Monterey 12.3.1
Thanks for the feature request. Indeed this would be useful. The code you have provided is one possible workaround in the meantime.
Thanks for the feature request. Indeed this would be useful. The code you have provided is one possible workaround in the meantime.
Just a heads up, this doesn't actually work in practice. If you have a resource that is roughly 36k worth of JSON for example and run it through a jackson mapper to convert to a data class, it breaks due to having only reached about 4100 bytes of data before it was considered done and the transform breaks. Seems to work fine for anything under 4000 bytes but breaks consistently with anything over 12k bytes in size (didn't have any sample data sets between 4k and 12k off hand).
This is probably not as good but it does at least allow me to hack around a solution that does retrieve all of the datapoints in general. But a good, clean, built-in solution would be desired.
suspend fun ByteStream.toInputStream(): InputStream = this.toHttpBody()
.readAll()
?.inputStream()
?: "".byteInputStream() // TODO This is probably worse than breaking???
The downside of this approach so far as I can tell is that the entire contents of the bytestream have to be loaded into memory. Thus if you have a large resource such as a movie, you are going to fail hard.
⚠️COMMENT VISIBILITY WARNING⚠️
Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.