kotlinx.serialization icon indicating copy to clipboard operation
kotlinx.serialization copied to clipboard

Consider flipping the default value of `alwaysUseByteString` and removing `@ByteString` in Cbor format

Open whyoleg opened this issue 4 months ago • 0 comments

RFC 8949 Concise Binary Object Representation (CBOR) defines three types of sequences:

  • a sequence of zero or more bytes ("byte string" / Major type 2)
  • a sequence of zero or more Unicode code points ("text string" / Major type 3)
  • a sequence of zero or more data items ("array" / Major type 4)

At the current moment, kotlinx.serialization supports all of them:

  • "byte string" via ByteArray annotated with @ByteString or via alwaysUseByteString=true configuration for Cbor object
  • "text string" via String
  • "array" via List<*>/Array<*>/ByteArray (and other primitive collections)

Encoding/decoding ByteArray by default using "array" brings inconsistency:

  1. Both String and ByteArray have native representations in Cbor, but only String uses that representation by default
  2. ProtoBuf has out-of-the-box support for encoding/decoding ByteArray as native bytes type

In addition to this, @ByteString annotation has an unfortunate name clash with kotlinx.io.bytestring.ByteString.

Proposal

While the Cbor format is still experimental, it might be wise to make alwaysUseByteString=true by default and remove @ByteString. It's not possible to make it fully safe and backward compatible. But we could try to make it as smooth as possible. Here is some initial proposal on how it could be achieved (each step should probably happen in 1-2 major releases apart):

  1. Annotate alwaysUseByteString and @ByteString with an opt-in annotation, which says that the default will change. Maybe annotate with it, even Cbor.Default, and Cbor {}. KT-54106 might have helped here. Or we could have an IDE inspection, which will say to set alwaysUseByteString explicitly.
  2. Make alwaysUseByteString=true by default while still allowing users to set alwaysUseByteString=false. Optionally, add a system property for the JVM target to bring back the old behavior.
  3. Deprecate alwaysUseByteString, @ByteString, and opt-in annotation.
  4. Remove alwaysUseByteString and @ByteString, and opt-in annotation.

Probably, steps 2 and 3 could be done at the same time.

After those changes, if, for some reason, there will be a need to serialize ByteArray as "array" of "signed number", it will be possible to replace ByteArray with Array<Byte> or List<Byte>, or use a simple custom serializer:

object ByteArraySerializer: KSerializer<ByteArray> {
    private val delegate = ArraySerializer(Byte.serializer())
    override val descriptor: SerialDescriptor get() = delegate.descriptor

    override fun serialize(encoder: Encoder, value: ByteArray) {
        encoder.encodeSerializableValue(delegate, value.toTypedArray())
    }

    override fun deserialize(decoder: Decoder): ByteArray {
        return decoder.decodeSerializableValue(delegate).toByteArray()
    }
}

whyoleg avatar Aug 08 '25 18:08 whyoleg