kotlinx.serialization icon indicating copy to clipboard operation
kotlinx.serialization copied to clipboard

Can't encode a nullable `ByteArray` without a default argument in Protobuf (it currently maps to Protobuf `bytes`, so should not be treated as a Protobuf collection type and should be allowed to be optional)

Open ShreckYe opened this issue 4 years ago • 10 comments

Describe the bug With the ProtoBuf format, it seems a ByteArray is encoded as a collection of numbers. How do I encode the raw bytes directly as the bytes type in The Scalar Value Types section in Language Guide? I see docs and issues doing this for CBOR using the @ByteString annotation but couldn't find anything for ProtoBuf. Is this currently supported?

To Reproduce Attach a code snippet or test data if possible.

"EncodeProtobufBytes.kt":

import kotlinx.serialization.Serializable
import kotlinx.serialization.encodeToByteArray
import kotlinx.serialization.protobuf.ProtoBuf
import kotlinx.serialization.protobuf.ProtoNumber

@Serializable
class OptionalBytesWrapper(
    @ProtoNumber(1)
    val bytes: ByteArray?
)

fun main() {
    println(ProtoBuf.encodeToByteArray(OptionalBytesWrapper(null)))
}

Output:

Exception in thread "main" kotlinx.serialization.SerializationException: 'null' is not supported for collection types in ProtoBuf
	at kotlinx.serialization.protobuf.internal.ProtobufTaggedEncoder.encodeNull(ProtobufTaggedEncoder.kt:44)
	at kotlinx.serialization.encoding.Encoder$DefaultImpls.encodeNullableSerializableValue(Encoding.kt:299)
	at kotlinx.serialization.protobuf.internal.ProtobufTaggedEncoder.encodeNullableSerializableValue(ProtobufTaggedEncoder.kt:11)
	at kotlinx.serialization.protobuf.internal.ProtobufTaggedEncoder.encodeNullableSerializableElement(ProtobufTaggedEncoder.kt:154)
	at OptionalBytesWrapper.write$Self(EncodeProtobufBytes.kt:6)
	at OptionalBytesWrapper$$serializer.serialize(EncodeProtobufBytes.kt:6)
	at OptionalBytesWrapper$$serializer.serialize(EncodeProtobufBytes.kt:6)
	at kotlinx.serialization.protobuf.internal.ProtobufEncoder.encodeSerializableValue(ProtobufEncoding.kt:128)
	at kotlinx.serialization.protobuf.ProtoBuf.encodeToByteArray(ProtoBuf.kt:130)
	at EncodeProtobufBytesKt.main(EncodeProtobufBytes.kt:15)
	at EncodeProtobufBytesKt.main(EncodeProtobufBytes.kt)

Expected behavior The ByteArray should be encoded directly as bytes and a null ByteArray should be supported.

Environment

  • Kotlin version: [1.5.21]
  • Library version: [1.2.2]
  • Kotlin platforms: [JVM]
  • Gradle version: [e.g. 4.10]
  • IDE version (if bug is related to the IDE) [e.g. IntellijIDEA 2019.1, Android Studio 3.4]
  • Other relevant context [e.g. OS version, JRE version, ... ]

ShreckYe avatar Aug 28 '21 10:08 ShreckYe

Hi! According to Encoding guide bytes encoded as Length-delimited - byte array with its length at the beginning. Since protobuf does not support the special value null, you can use the absence of a value to encode it - which is perceived by the library as a default value.

@Serializable
class OptionalBytesWrapper(
    @ProtoNumber(1)
    val bytes: ByteArray? = null
)

fun main() {
    println(ProtoBuf.encodeToByteArray(OptionalBytesWrapper(null)))
}

shanshin avatar Aug 30 '21 12:08 shanshin

@shanshin Thanks. It works! However, I would still wish that there be an annotation to distinguish them explicitly.

ShreckYe avatar Aug 31 '21 09:08 ShreckYe

@ShreckYe Bytes, strings, and embedded messages are all length-delimeted data. Proto format usually don't distinguish them (in the serialized form)

sandwwraith avatar Aug 31 '21 11:08 sandwwraith

@sandwwraith Yeah, I am aware of this. I mean to distinguish such length-delimited data from a collection of data (repeated fields).

ShreckYe avatar Sep 01 '21 02:09 ShreckYe

Hi, I just reviewed this issue again. And now I realize that it's not necessary to support encoding a repeated field of Bytes as it just wastes space. In the packed way, and if every byte is between 0 and 127, it is the same as using a bytes field; otherwise a byte takes up to 4 bytes due to the variant encoding. In the unpacked way, it just wastes more space. And if I really need to do this in a rare case, I might just use Array<Byte> or List<Byte>.

ShreckYe avatar Mar 05 '24 14:03 ShreckYe

Sorry I made a mistake. What's in the issue description is still not resolved.

ShreckYe avatar Mar 05 '24 14:03 ShreckYe

@ShreckYe Protobuf packed vs repeated are both valid encodings for the same data (the defaults change between v2 and v3). But due to this it is not possible to distinguish an empty array and a null value. If you want to encode a null value you will need a (nullable) wrapper object that then contains the array elements (or nothing) inside it.

pdvrieze avatar Mar 05 '24 16:03 pdvrieze

@pdvrieze Thanks. The solution given by @shanshin above also works without a wrapper.

I think the question here is: does the current implementation map ByteArray to repeated int32s or bytes?

@Serializable
class Wrapper<T>(val value : T)
@Serializable
class PackedWrapper<T>(@ProtoPacked val value : T)
println(ProtoBuf.encodeToHexString(Wrapper(byteArrayOf(1, -1))))
println(ProtoBuf.encodeToHexString(PackedWrapper(byteArrayOf(1, -1))))
println(ProtoBuf.encodeToHexString(Wrapper(shortArrayOf(1, -1))))
println(ProtoBuf.encodeToHexString(PackedWrapper(shortArrayOf(1, -1))))

By running the code above, I get the following results:

0a0201ff
0a0201ff
080108ffffffffffffffffff01
0a0b01ffffffffffffffffff01

So one can see that the current implementation maps ByteArray to bytes, but ShortArray to possibly packed repeated int32s in varint encoding, which is a bit inconsistent.

I would say this inconsistency is fine because it provides the most efficient choice for every data type. So the remaining problem is, if a ByteArray gets serialized to a bytes field, the field is allowed to be optional and so the property type is no longer a collection type and should be allowed to be nullable.

ShreckYe avatar Mar 06 '24 01:03 ShreckYe

So, to sum up, the solution is that we should allow ByteArray to be nullable?

sandwwraith avatar Mar 12 '24 18:03 sandwwraith

My opinion is yes.

ShreckYe avatar Mar 13 '24 00:03 ShreckYe