kotlinx.serialization icon indicating copy to clipboard operation
kotlinx.serialization copied to clipboard

Support Proto oneOf with new annotations and polymophic serializer

Open xiaozhikang0916 opened this issue 1 year ago • 8 comments

I commented in the issue #67 but I think it is necessary to open a new issue for further discussion.


I am writing a code-gen plugin to build kotlin data class for our proto file. As for oneOf case, a sealed interface may be an ideal solution.

Let's say I have a proto message like

message Person {
    string name = 1;
    oneof phone {
        string mobile = 2;
        string home = 3;
    }
}

My ideal data class will be like

data class Person(
    val name: String,
    val phone: IPhoneType,
)

sealed interface IPhoneType

data class MobilePhone(val value: String): IPhoneType

data class HomePhone(val value: String): IPhoneType

So I need to tell the ProtoBuf Decoder that if it comes with ProtoNum 2 or 3, deserialize it as IPhoneType and assign to the phone field.

A custom serializer for the whole Person class can work, but it would be nice to have some additional annotation supports. Like:

data class Person(
    @ProtoNum(1) val name: String,
    @ProtoOneOfFields(2, 3) val phone: IPhoneType,
)

sealed interface IPhoneType

@ProtoOneOfNum(2)
data class MobilePhone(val value: String): IPhoneType

@ProtoOneOfNum(3)
data class HomePhone(val value: String): IPhoneType

@ProtoOneOfFields tells that this field may be assined by the following ProtoNums, and the @ProtoOneOfNum on the concrete class tells which ProtoNum can be parsed to this type.


Generally, this case could be a flatten key for common serializer ( not ProtoBuf only ).

For example, encoding an instance of Person defined like

@Serializable
data class Person(
    val name: String,
    @Flatten val address: Address,
)

@Serializable
data class Address(val city: String, val street: String)

is especting content (show in json) like:

{
    "name": "Jhon",
    "city": "London",
    "street": "1st Avenue"
}

And then, @ProtoOneOfFields can be a subtype of @Flatten, with more flexiable funcitons.

xiaozhikang0916 avatar Jan 03 '24 07:01 xiaozhikang0916

Yes, that's one of the ways to do it. Although I'd say that @Flatten is a different scenario.

sandwwraith avatar Jan 10 '24 16:01 sandwwraith

@ProtoOneOfFields may be misleading as there is no way for the format to actually resolve the candidates outside the normal polymorphic enumeration system. As such this annotation could only serve to restrict the candidates (which may be a valid feature, but should be named such that that is clear).

Personally I would go with @ProtoOneOf on the field (or a configuration option to make that the default in case that all candidates have numbers). Then I'd make @ProtoNum apply to types (rather than just fields) - I don't think a differently named annotation is needed here.

As an implementation note, it is a bit "clunky" to extract all polymorphic descendants from a SerialModule, but it is possible.

pdvrieze avatar Jan 10 '24 16:01 pdvrieze

It is true that oneOf cannot just work with current polymorphic system. This implementation is also to find a way to make them work together.

Just change a bit according to @pdvrieze suggested.

data class Person(
    @ProtoNum(1) val name: String,
    @ProtoOneOf(2, 3) val phone: IPhoneType,
)

sealed interface IPhoneType
@ProtoNum(2) data class MobilePhone(val value: String): IPhoneType
@ProtoNum(3) data class HomePhone(val value: String): IPhoneType

The first thing is, when the serializer goes to val phone: IPhoneType, it should not treat the property as a serailizable with calling beginStructure, but read/write the inner value property of the concreate type directly instead.

That is why I think a Flatten annotation is necessary here.

As for the polymorphic handling, the ProtoNum is a perfect discriminator to find serializer of sealed interface, without any extra content needed. But it seems that such extra content is required by AbstractPolymorphicSerializer, so the ProtoBuf format may provide another PolymorphicSerializer implementation.

xiaozhikang0916 avatar Jan 11 '24 15:01 xiaozhikang0916

And such new polymophic serializer is helpful if someone wants to hold proto enum data in kotlin sealed interface, rather than kotlin enum class.

xiaozhikang0916 avatar Jan 11 '24 16:01 xiaozhikang0916

For implementation the format can recognise the polymorphic serializer and do its own thing. No need for another serializer that tells it how to. Alternatively you can have the format synthesise the data expected by the polymorphic serializer. The xml format does this for example.

pdvrieze avatar Jan 11 '24 16:01 pdvrieze

The way to implement it is as follows for the decoding

  • The regular CompositeDecoder determines which of its children is polymorphic (with OneOf handling).
  • For each child the decoder determines all potential polymorphic children (caching makes sense here) either because the type is sealed or from the serializerModule. It
  • When a child is read that is a "oneOf" child it uses the mapping from the previous step to then create "special" transparent CompositeDecoder (ie. OneOfDecoder : CompositeDecoder) that takes the expected type/serializer as one of its constructor parameters.
  • The OneOfDecoder would synthesize decoding in two parts. (beginStructure/endStructure would not be reading any actual data) The first is a string of the serialName for the type. The second is the actual deserialization of the value (only this involves reading the protobuf content).

Encoding works in parallel, but the OneOfEncoder rather has a property it can ignore it entirely. Then when the value is written as second member it uses the the protonum annotation on the type (from the passed in serialDescriptor) for writing (rather than the default number used for polymorphism).

pdvrieze avatar Jan 11 '24 18:01 pdvrieze

I have exactly the same use case. Coincidentally, I just started to take a look to see how hard it was to implement it after checking the protobuf schema generated for a polymorphic class. Good to know I'm not the only one.

For context, this is the schema generated for my classes:

syntax = "proto2";


// serial name 'cash.atto.commons.AttoTransaction'
message AttoTransaction {
  required AttoBlock block = 1;
  ...
}

// serial name 'cash.atto.commons.AttoBlock'
message AttoBlock {
  required string type = 1;
  // decoded as message with one of these types:
  //   message CHANGE, serial name 'CHANGE'
  //   message OPEN, serial name 'OPEN'
  //   message RECEIVE, serial name 'RECEIVE'
  //   message SEND, serial name 'SEND'
  required bytes value = 2;
}

message CHANGE { // I'm also serializing this class to json, this is why I'm using @SerialName
  ...
}

message OPEN {
  ...
}
...

And this is what I was expecting:

syntax = "proto2";


// serial name 'cash.atto.commons.AttoTransaction'
message AttoTransaction {
  oneof block {
	AttoOpen open = 1;
	AttoSend send = 2;
	...
  }
  ...
}

// serial name 'cash.atto.commons.AttoOpen'
message AttoOpen {
  ...
}

// serial name 'cash.atto.commons.AttoSend'
message AttoSend {
  ...
}
...

rotilho avatar Jan 13 '24 10:01 rotilho

Sounds interesting @xiaozhikang0916

a2xchip avatar Feb 01 '24 02:02 a2xchip

🎉 great news

a2xchip avatar Apr 25 '24 23:04 a2xchip