kotlinx.serialization icon indicating copy to clipboard operation
kotlinx.serialization copied to clipboard

Add methods to BinaryFormat to encode/decode directly to/from ByteArray/ByteBuffer or OutputStream/InputStream to avoid copy.

Open Delsart opened this issue 2 months ago • 1 comments

Current Situation:

The BinaryFormat interface, implemented by formats like ProtoBuf and Cbor, only provides:

fun <T> encodeToByteArray(serializer: SerializationStrategy<T>, value: T): ByteArray
fun <T> decodeFromByteArray(deserializer: DeserializationStrategy<T>, bytes: ByteArray): T

The encodeToByteArray method always allocates a new ByteArray to hold the serialized result.

Issue:

In performance-critical scenarios (e.g., network packet construction, processing large objects), this leads to unnecessary memory allocation and data copying. We cannot serialize directly into a pre-allocated buffer (like a ByteBuffer or an existing ByteArray that's part of a larger buffer/stream) or stream.

The Proposed Solution / Feature

We request new overloads or extension functions on BinaryFormat (or perhaps a new interface extension for advanced binary IO) that allow the user to specify the output target:

Proposal 1: Writing to an OutputStream

fun <T> encodeToStream(
    serializer: SerializationStrategy<T>, 
    value: T, 
    stream: OutputStream // Or a platform-specific equivalent in common code
)

Proposal 2: Writing to a ByteArray starting at an offset

fun <T> encodeToByteArray(
    serializer: SerializationStrategy<T>, 
    value: T, 
    output: ByteArray, 
    offset: Int = 0
): Int // Returns the number of bytes written

Proposal 3: Writing to a ByteBuffer (JVM/Native focus):

fun <T> encodeToByteBuffer(
    serializer: SerializationStrategy<T>, 
    value: T, 
    output: ByteBuffer,
): Int // Returns the number of bytes written

Justification/Motivation

  1. Zero-Copy Serialization: Essential for high-throughput applications to avoid copying data from the internal serialization buffer to a final destination buffer.

  2. Reduced GC Pressure: By reusing pre-allocated buffers (e.g., a ByteBuffer for a network channel or an OutputStream that wraps a pooled buffer), we significantly reduce the allocation rate and Garbage Collector overhead.

  3. Consistency: The Json format already provides encodeToStream/decodeFromStream (or similar), and binary formats should have an equivalent to support efficient IO operations.

Delsart avatar Oct 18 '25 22:10 Delsart

There's already some streaming requests in #2075 and #2618 and probably others.

I think the general consensus is to finish kotlinx.io first, although it's been a while since the maintainers chimed in on that point.

JakeWharton avatar Oct 19 '25 02:10 JakeWharton