kotlinx.serialization Support oneof declaraction in protobuf

It is an implementation of #2538 , to support oneof declaraction in protobuf.

Some major changes:

Introduce a new annotation ProtoOneOf to mark a property of class should be decoded and encoded in the oneof way.
ProtoNumber now can be attached to a class, to mark the proto id if such class is an implementation of oneof field.
SealedClassSerializer.subclassSerializers now is a public property, because when decoding message with oneof mark, I need to find the related serializer of concrete class with ProtoNumber annotation, assisted by change 2.

For change 3, there may be some other implementation to achieve it. For example, binding the proto id with specific serializer in ProtoOneOf annotation like

public annotation class ProtoOneOf(public vararg val numbers: ProtoOneOfPair)

public annotation class ProtoOneOfPair(public val number: Int, public val serializer: KClass<KSerializer<*>>)

But it may be tedious for user to write some super heavy tag for a property in data class, and hard to know that the protoNumber annotated in oneof field has priority to number in inner proerty. so current implementation became my choise.

If everything in this PR is OK, I will continue to update comments of code and docs.

Jan 16 '24 15:01 xiaozhikang0916

One thing I'm not sure that you handled is how to deal with child classes that are not annotated with @ProtoNum (you may want to ignore them/throw an error on encoding).

It should be an error in this case. Without the annotation it can never get a proper ProtoNumber since there is no guarantee on the order of the sub serializer, and not all oneof fields start with id 1.

But it is hard to find a proper time to throw an error. The best choise should be in compile time but a specific format has nothing to do with the plugin. Maybe perform a traversal check every time an oneof field is encountered?

Jan 17 '24 14:01 xiaozhikang0916

For a type that contains a OneOf you enumerate all leaf serializers anyway. At that point you can reliably throw an exception of there are any that have missing annotations. You could theoretically do it by deep inspection of the types, but that would be overkill. From my perspective reliable runtime errors can be acceptable.

The main thing to keep in mind is how it interacts with changed/updated classes (different library version/classloader shenanigans, etc.) when that is not direct under developer control (say you have a library that uses protobuf, some user implements one of the interfaces (when not sealed) and now the application doesn't work because the library uses protobuf internally... (maybe required annotations are something that the plugin could be made to support).

Complicating is that deserialization can safely ignore types without ProtoNum, they will never match. Serialization, when such children are present, will be an issue as there is no proper protoNum present. As an option, it could be an idea to support fallback to general polymorphic format (you'd only need one "special" value number that is used in such case - the @ProtoNum value of the "owner field" would be backwards compatible).

Jan 17 '24 17:01 pdvrieze

it could be an idea to support fallback to general polymorphic format

I might argue that it would lead to inconsistency of encoding and decoding. The field annotated with ProtoOneOf has no number, and the proto number of property in child class may conflict with existing field and casue overwritting. The serializer cannot decode the same value from what it encoded to. This behaviour is error prone, hard to debug and confusing.

how it interacts with changed/updated classes

It could be some trouble theoretically, but it is more a core lib issue here in my opinion. The same usage may also fail when not using ProtoOneOf or even not in ProtoBuf format, and if some one managed to make it work with custom serialize module passing in, just specify the proto number of serializer when registering.

Jan 18 '24 05:01 xiaozhikang0916

The shceme generator on current branch can only generate oneof scheme for sealed type, by the support of all subserializer encoded in the sealed serializer descriptor.

For an open one, the ProtoBuf instance should be passed in to achieve the SerializersModule and iterate the SerialModuleImpl.polyBase2Serializers.

What do you think of it? @rotilho

Jan 21 '24 04:01 xiaozhikang0916

Makes sense to me.

I wonder though if using a repeatable @ProtobufOneOf to pair the num with a subclass wouldn't be a more straightforward approach. This, however, would force to know upfront all subclasses, probably not a problem for sealed but may be to open polymorphism.

Jan 21 '24 09:01 rotilho

I would like to share my use case for context.

Our groups are using proto3 syntex to define RPC methods and messages between clients(iOS and android) and servers(go). As we are adopting KMM in clients, I am migrating the data class in Java generated by proto plugin to kotlin data class, which are generated in compile time.

In this scenario, all the data classes are static snd not editable, to match the definition of messages in proto files. So the oneof elements are enumerable, and sealed interface fit this case best. That why I prefer sealed polymorphic in my implementation.

Though supporting of open polymorphic has some meaning in theory, I still wonder if there are some actual use cases of it. In my opinion:

If the proto file exists, the oneof elements are settled, so the declaraction can be sealed.
If there is no proto file, such as communicating between kotlin processes, oneof is not necessary here, just use the common polymorphic serializer.
If developer wants to use different kinds rather than the pre-defined sealed class, just map them in runtime.

I wonder though if using a repeatable @ProtobufOneOf to pair the num with a subclass wouldn't be a more straightforward approach.

@rotilho As I said in the top description, this way of declaraction can do the work, but I don't think it is a good api, as developer need to write so many annotations on a property( in my case, there will be more than 20), making it unreadable. And for sealed interfaces which I prefer most, the mapping relations are settled at the moment they are defined, meaning that these mappings are redundent.

Jan 21 '24 11:01 xiaozhikang0916

Rewrote decoding in contract of common polymorphic serializer.

But it still needs to find the actual serializer to get a proper serial name for polymorphic serializer.

@pdvrieze

Jan 30 '24 05:01 xiaozhikang0916

Sorry for my urging, but is there any work to do for this request to be accepted?

Feb 05 '24 16:02 xiaozhikang0916

Hey @pdvrieze. Any chance to take a look on this PR when you have some time? I'd love to be able to use it

Feb 15 '24 21:02 rotilho

Hi @sandwwraith could you please also take a look on this PR whether it is qualified to be merged.

I am looking forward to use it in my product code.

Feb 23 '24 15:02 xiaozhikang0916

@xiaozhikang0916 Sure thing, I'll take a look at it soon

Feb 29 '24 11:02 sandwwraith

Found some api check failed in the TeamCity check, but running apiDump in pure dev branch would add public synthetic class kotlinx/serialization/json/internal/JsonStreamsKt$EntriesMappings, which should not exist, in the Json format api file.

Just remove it from commit history.

Apr 01 '24 08:04 xiaozhikang0916

Is val numbers in @ProtoOneOf really required?

You are right, numbers here is unnecessary. It may need a benchmark in the future, but works fine for now.

ProtoNumber of oneof

In a protobuf-file-based develop mode, the proto files will be the source-of-truth to generate data classes with some plugin. In the definition of protobuf, each oneof field can only be declared in a message and cannot be reused. So the generated sealed interface has one-to-one relation with a concreate data class, no need to worry about the id conflicts.

In a kotlin-based develop mode, the case what you worry about may happen. But in practice, oneof is a special mechanism of protobuf that developers will have few needs to depend on. In other words, why not just use the common polymorphic of common serialization?

However, as a designed limitation, I would add some more explaination in the docs, and some runtime check in ProtobufDecoding for duplated proto ids, but cannot find some simular check-point in ProtobufEncoding.

Apr 04 '24 09:04 xiaozhikang0916

But in practice, oneof is a special mechanism of protobuf that developers will have few needs to depend on.

Perhaps. However, you can write Kotlin classes without a special proto plugin to be compatible with the existing schema. I'm not sure if it is popular in protobuf to reuse messages in oneof, or if everyone just uses unique classes for that. Anyway, we can work with this limitation and look at the feedback

Apr 04 '24 10:04 sandwwraith

Oh, and about

Found some api check failed in the TeamCity check, but running apiDump in pure dev branch would add public synthetic class kotlinx/serialization/json/internal/JsonStreamsKt$EntriesMappings, which should not exist, in the Json format api file.

This is a bug in BCV (https://github.com/Kotlin/binary-compatibility-validator/issues/141) which is fixed in 0.14.0. I'll update the version soon.

However, there are some examples failing, probably due to a new duplicate check: https://teamcity.jetbrains.com/buildConfiguration/KotlinTools_KotlinxSerialization_RuntimeLibraryBuildAggregated/4559581?hideTestsFromDependencies=false&hideProblemsFromDependencies=false&expandBuildChangesSection=true&expandBuildTestsSection=true&expandBuildProblemsSection=true

Apr 04 '24 10:04 sandwwraith

I'm not sure if it is popular in protobuf to reuse messages in oneof.

It is totally fine to reuse message between oneof fields. Let me elaborate a little bit.

Giving messages like

message ShareMessage {
    string common = 1;
}

message Some {
    string name = 1;
    oneof pack {
        ShareMessage share_in_some = 2;
    }
}

Definations of message Some in kt should be data class Some the outer class, sealed interface IPack the oneof interface, data class ShareInSome(val value: ShareMessage): IPack message holders for each oneof field with proto ids, and data class ShareMessage, the message to be sharing.

Now we share ShareMessage in another message:

message Another {
    string another_name = 1;
    oneof another_pack {
        ShareMessage share_in_another = 3;
    }
}

Note that we cannot share pack in Some in a protobuf file, but only redefine another oneof field with every messages we want to share, e.g. ShareMessage here.

The oneof interface for previous class IPack, as a representative of the oneof pack declaraction, should not be reused either. We should have another oneof interface IAnotherPack, another group of message holders with proto ids stand for theriselves. In this way, each group of message holders and oneof interface can only be used once, which prevent the id conflicts.

In summary, oneof in protobuf is a complecated mechanism, and using it in kotlin data class requires some prerequisites for knowledge of protobuf.

Apr 04 '24 13:04 xiaozhikang0916

Thanks for the explanation. I've re-read everything — subclasses are meant to be used in one particular place, and there is even a requirement mentioned that they should have exactly one property. In that case, they don't have to be reusable. It is not entirely an optimal way to do this, but I think it is the best way we can do it in case a polymorphism is required.

Is it possible though, to enforce 'subclass should have only one property' requirement via SerialDescriptor.elementCount == 1? This place is easy to miss in the docs, and some runtime aid may be helpful.

Apr 04 '24 14:04 sandwwraith

Is it possible though, to enforce 'subclass should have only one property' requirement via SerialDescriptor.elementCount == 1? This place is easy to miss in the docs, and some runtime aid may be helpful.

UPD: found require(descriptor.elementsCount == 1) in OneOfElementEncoder

Apr 04 '24 15:04 sandwwraith

Yes, both encoder and decoder have a check of elementsCount in init block during runtime. It would be better to have a check in buildtime but can do nothing more inside a particular serialization format lib, and I think it is done enough for now.

Apr 04 '24 15:04 xiaozhikang0916

t would be better to have a check in buildtime but can do nothing more inside a particular serialization format lib, and I think it is done enough for now.

Yes, unfortunately we do not have any special mechanisms to provide formats with compile-time checks.

Apr 04 '24 16:04 sandwwraith

I have re-thought about the design of oneof declaration. Is it necessary to annotate the message holder class with ProtoNumber?

The message holder, as described above, should have only 1 property. We can still use the origin form of ProtoNumber annotated at this property, as the one we need.

In this way, we need no to introduce another annotation target to ProtoNumber, need no to think about that which id will override another. And oneof class can be defined in the same way developers get used of, reducing the likelihood of errors.

In fact, there were some error found in my implementation and test case during this refactoring, which should be fixed right now.

By the way, I think writing unit tests with data in binary format is quite cryptic here, e.g. hard to change the hex string after proto id updated. Maybe some utils like protoscope is necessary in preparing binary data.

Apr 06 '24 12:04 xiaozhikang0916

By the way, I think writing unit tests with data in binary format is quite cryptic here, e.g. hard to change the hex string after proto id updated. Maybe some utils like protoscope is necessary in preparing binary data.

Add a simple python script to convert hex string to protoscope-syntext comment string for test files, thanks GPT.

Apr 08 '24 05:04 xiaozhikang0916

It seems you need to run knitCheck to update documentation (likely because I've suggested to add a comment to a code sample)

Apr 25 '24 10:04 sandwwraith

Awesome guys! Thank you very much for working on this 🎉🎉🎉

Apr 26 '24 09:04 rotilho