Can't encode a nullable `ByteArray` without a default argument in Protobuf (it currently maps to Protobuf `bytes`, so should not be treated as a Protobuf collection type and should be allowed to be optional)
Describe the bug
With the ProtoBuf format, it seems a ByteArray is encoded as a collection of numbers. How do I encode the raw bytes directly as the bytes type in The Scalar Value Types section in Language Guide? I see docs and issues doing this for CBOR using the @ByteString annotation but couldn't find anything for ProtoBuf. Is this currently supported?
To Reproduce Attach a code snippet or test data if possible.
"EncodeProtobufBytes.kt":
import kotlinx.serialization.Serializable
import kotlinx.serialization.encodeToByteArray
import kotlinx.serialization.protobuf.ProtoBuf
import kotlinx.serialization.protobuf.ProtoNumber
@Serializable
class OptionalBytesWrapper(
@ProtoNumber(1)
val bytes: ByteArray?
)
fun main() {
println(ProtoBuf.encodeToByteArray(OptionalBytesWrapper(null)))
}
Output:
Exception in thread "main" kotlinx.serialization.SerializationException: 'null' is not supported for collection types in ProtoBuf
at kotlinx.serialization.protobuf.internal.ProtobufTaggedEncoder.encodeNull(ProtobufTaggedEncoder.kt:44)
at kotlinx.serialization.encoding.Encoder$DefaultImpls.encodeNullableSerializableValue(Encoding.kt:299)
at kotlinx.serialization.protobuf.internal.ProtobufTaggedEncoder.encodeNullableSerializableValue(ProtobufTaggedEncoder.kt:11)
at kotlinx.serialization.protobuf.internal.ProtobufTaggedEncoder.encodeNullableSerializableElement(ProtobufTaggedEncoder.kt:154)
at OptionalBytesWrapper.write$Self(EncodeProtobufBytes.kt:6)
at OptionalBytesWrapper$$serializer.serialize(EncodeProtobufBytes.kt:6)
at OptionalBytesWrapper$$serializer.serialize(EncodeProtobufBytes.kt:6)
at kotlinx.serialization.protobuf.internal.ProtobufEncoder.encodeSerializableValue(ProtobufEncoding.kt:128)
at kotlinx.serialization.protobuf.ProtoBuf.encodeToByteArray(ProtoBuf.kt:130)
at EncodeProtobufBytesKt.main(EncodeProtobufBytes.kt:15)
at EncodeProtobufBytesKt.main(EncodeProtobufBytes.kt)
Expected behavior
The ByteArray should be encoded directly as bytes and a null ByteArray should be supported.
Environment
- Kotlin version: [1.5.21]
- Library version: [1.2.2]
- Kotlin platforms: [JVM]
- Gradle version: [e.g. 4.10]
- IDE version (if bug is related to the IDE) [e.g. IntellijIDEA 2019.1, Android Studio 3.4]
- Other relevant context [e.g. OS version, JRE version, ... ]
Hi!
According to Encoding guide bytes encoded as Length-delimited - byte array with its length at the beginning.
Since protobuf does not support the special value null, you can use the absence of a value to encode it - which is perceived by the library as a default value.
@Serializable
class OptionalBytesWrapper(
@ProtoNumber(1)
val bytes: ByteArray? = null
)
fun main() {
println(ProtoBuf.encodeToByteArray(OptionalBytesWrapper(null)))
}
@shanshin Thanks. It works! However, I would still wish that there be an annotation to distinguish them explicitly.
@ShreckYe Bytes, strings, and embedded messages are all length-delimeted data. Proto format usually don't distinguish them (in the serialized form)
@sandwwraith Yeah, I am aware of this. I mean to distinguish such length-delimited data from a collection of data (repeated fields).
Hi, I just reviewed this issue again. And now I realize that it's not necessary to support encoding a repeated field of Bytes as it just wastes space. In the packed way, and if every byte is between 0 and 127, it is the same as using a bytes field; otherwise a byte takes up to 4 bytes due to the variant encoding. In the unpacked way, it just wastes more space. And if I really need to do this in a rare case, I might just use Array<Byte> or List<Byte>.
Sorry I made a mistake. What's in the issue description is still not resolved.
@ShreckYe Protobuf packed vs repeated are both valid encodings for the same data (the defaults change between v2 and v3). But due to this it is not possible to distinguish an empty array and a null value. If you want to encode a null value you will need a (nullable) wrapper object that then contains the array elements (or nothing) inside it.
@pdvrieze Thanks. The solution given by @shanshin above also works without a wrapper.
I think the question here is: does the current implementation map ByteArray to repeated int32s or bytes?
@Serializable
class Wrapper<T>(val value : T)
@Serializable
class PackedWrapper<T>(@ProtoPacked val value : T)
println(ProtoBuf.encodeToHexString(Wrapper(byteArrayOf(1, -1))))
println(ProtoBuf.encodeToHexString(PackedWrapper(byteArrayOf(1, -1))))
println(ProtoBuf.encodeToHexString(Wrapper(shortArrayOf(1, -1))))
println(ProtoBuf.encodeToHexString(PackedWrapper(shortArrayOf(1, -1))))
By running the code above, I get the following results:
0a0201ff
0a0201ff
080108ffffffffffffffffff01
0a0b01ffffffffffffffffff01
So one can see that the current implementation maps ByteArray to bytes, but ShortArray to possibly packed repeated int32s in varint encoding, which is a bit inconsistent.
I would say this inconsistency is fine because it provides the most efficient choice for every data type. So the remaining problem is, if a ByteArray gets serialized to a bytes field, the field is allowed to be optional and so the property type is no longer a collection type and should be allowed to be nullable.
So, to sum up, the solution is that we should allow ByteArray to be nullable?
My opinion is yes.