kotlinx.serialization icon indicating copy to clipboard operation
kotlinx.serialization copied to clipboard

Are there any plans to add chunk-based (stream-based) decoding for JS?

Open andrew-k-21-12 opened this issue 1 year ago • 5 comments

What is your use-case and why do you need this feature? Just a regular case of parsing JSON responses in JS. I would like to have such feature to avoid allocating the entire response while parsing it.

Describe the solution you'd like The closest conceptual alternative from the JVM world is Json.decodeFromStream (but I didn't check its sources - whether it performs the decoding in the sequential token-based manner or just converts the entire InputStream into one String under the hood).

To illustrate a brief sketch of what I mean:

external interface Chunk {
    val done: Boolean
    val value: Uint8Array
}

suspend fun example() {
    val textDecoder = TextDecoder()
    val response = window
        .fetch("https://api.spacexdata.com/v4/launches", RequestInit())
        .await()
    val reader = response.body.getReader()
    while (true) {
        val chunk = reader.read().unsafeCast<Promise<Chunk>>().await()
        console.warn("to do - parsing a chunk by tokens: ${textDecoder.decode(chunk.value)}")
        if (chunk.done) break
    }
}

Is it something to be expected for JS, or there is no much need for it?

andrew-k-21-12 avatar Jan 25 '24 15:01 andrew-k-21-12

The challenge here is how to determine a "chunk". I see two ways:

  • Some sort of path expressions where each match is a chunk.
  • Sequences/dynamic lists at the toplevel (this is also supported by a path expression).

Both would result in some sort of (lazy) sequence of serializable elements, specifics would need to be format specific although it might be possible to use the serial descriptor of the container(s) to "guide" the path expression/selection of items.

There is currently no support/api for such a thing, but it is clearly something that could have value. Again, it would need a per-format implementation.

pdvrieze avatar Jan 25 '24 17:01 pdvrieze

We don't see much demand for it and are mostly focused on common solutions (i.e. okio integration) that cover the similar use-case for all the platforms rather than on a platform-specific API

qwwdfsad avatar Jan 25 '24 17:01 qwwdfsad

@pdvrieze, unsure if I got your explanations regarding the chunks. Didn't think a lot about it myself, but something first coming to mind is the JsonReader class from Android. By chunks I originally meant just default parts window.fetch(...) provides in its response body reader. So the idea (how I would start with it if there is no goal to support a big library with lots of users) is just to read these response parts sequentially with some additional dynamic buffer (for cases when one JSON token "crosses" two chunks). Then, after having some means to read a JSON as a stream of tokens, the task to parse it into some data class becomes more straightforward. But again - I didn't think about the overall design.

@qwwdfsad, if okio integration with its stream equivalents (I mean Source and Sink) is something on the current agenda, then it's just very fine. I think I would be especially happy if such okio integration won't increase the footprint of the app's overall minified size a lot. Also, possibly it would be still nice to have some interfaces to provide sequential data sources to be decoded - for those who for some reason wants to use something else than okio.

@qwwdfsad, if there is some public card / task / issue or anything else to track the state and progress of the okio integration, please share it 🙏 I think there won't be anything else for this GitHub issue here, so it might be closed.

andrew-k-21-12 avatar Jan 26 '24 13:01 andrew-k-21-12

It's already been in the library since 1.4.0: https://kotlinlang.org/api/kotlinx.serialization/kotlinx-serialization-json-okio/ 😄

qwwdfsad avatar Jan 26 '24 14:01 qwwdfsad

Thank you, this is what I need.

At first look it seems like it's just required to provide proper implementations only for okio.BufferedSource's exhausted() and readUtf8CodePoint() methods to use Json.decodeFromBufferedSource<...>(...) with fetch's response body reader (as was shown in the thread-starting post). It doesn't look as something complicated, but I can't provide my own implementations for the okio.BufferedSource interface as it's sealed (yes, I haven't been working with okio and kotlinx.serialization before). Digging further, feeling that this is almost it.

UPD: Figured it out (need to use okio.Source first), please nevermind.

UPD2: Not as easy as I expected. okio.Source doesn't provide suspend methods to override (here and after I mostly mean read(sink: Buffer, byteCount: Long): Long). At the same time fetch provides response data only as promises - so calling await() on these responses is not possible inside okio.Source, while using callbacks (then { ... }) is impossible as well as okio.Source's methods require immediate return values.

andrew-k-21-12 avatar Jan 26 '24 15:01 andrew-k-21-12