kotlinx.serialization
kotlinx.serialization copied to clipboard
Support using an integer as the class/case discriminator in polymorphic serialization
As for now, this library serializes the class/type of a subclass using a serialName
in polymorphic serialization, which can be overridden with the @SerialName
annotation. I'd like to propose supporting using a number and requiring subclasses to use numbers in polymorphic serialization, as it shortens the size of the serialized message greatly, especially in a binary format such as Protobuf.
When choosing to use a binary format such as Protobuf, one of the biggest concerns is shortening the size of serialized messages. In such a case serializing a fully qualified name of a subclass by default doesn't make much sense, as the binary format such as Protobuf compared to a readable string format as JSON reduces the serialized size of other fields by more than 50%, while a class discriminator takes the same size and takes up the majority of the message size, which defies the whole point. An alternative might be to override the serial name with a short name or a single letter, but IMO, still, the message is longer and it's less elegant than using a number.
More details to discuss
Current API design
The current design is like this:
@Serializable
@UseSerialPolymorphicNumbers
sealed class Sealed {
@Serializable
@SerialPolymorphicNumber(Sealed::class, 1)
object Case : Sealed()
}
Json.encodeToString<Sealed>(Sealed.Case)
Json.decodeFromString<Sealed>("""{"type":1}""")
One marks the classes with @UseSerialPolymorphicNumbers
and @SerialPolymorphicNumber
, when the message is serialized it will be {"type":1}
in JSON, for example.
Alternatives of implementation details and designs
The implemented code is a prototype. Here are improvements and alternatives to be discussed.
Alternative names for "serial polymorphic number"
There can be alternative names for the word "serial polymorphic number". For example, these are what I can think of:
- (serial) discriminator number
- polymorphic number
- polymorphic serial number
- (serial) subclass number
- (serial) case number
Functions
There are 2 properties useSerialPolymorphicNumbers
and serialPolymorphicNumberByBaseClass
and a function getSerialPolymorphicNumberByBaseClass
which depend on annotations
added to the SerialDescriptor
interface, which are also overridden in CommonSerialDescriptor
for lazy caching. If this brings about unintended breaking changes to the existing core API, I can refactor them to be extension functions in a Polymorphic.kt
file like the one in kotlinx.serialization.json.internal
.
The added function names may have better alternatives too.
Some more possible unimplemented APIs
An independent version of the @SerialPolymorphicNumber
annotation
We can also add a version of the @SerialPolymorphicNumber
annotation which is independent of the base class:
@Serializable
@UseSerialPolymorphicNumbers
sealed class Sealed {
@Serializable
@IndependentSerialPolymorphicNumber(1)
object Case : Sealed()
}
And it shall be overridden by the dependent ones if specified.
Complements to the annotations in the SerializersModule
DSL
In addition to the annotations, there can be completion APIs to set them in the SerializersModule
DSL, to ensure this requirement for all data, or for external classes which can't be marked with annotations. So we can add the following APIs:
-
allUseSerialPolymorphicNumbers
property inSerializersModuleBuilder
to require all polymorphic serialization data to use numbers, in case there are missing ones. -
useSerialPolymorphicNumbers
property inPolymorphicModuleBuilder
as a complement to@UseSerialPolymorphicNumbers
, which can also override the@UseSerialPolymorphicNumbers
annotation andallUseSerialPolymorphicNumbers
. - An additional
serialPolymorphicNumber
argument inPolymorphicModuleBuilder.subclass
as a complement to@SerialPolymorphicNumber
, which can override@SerialPolymorphicNumber
annotation.
Example:
SerializersModule {
allUseSerialPolymorphicNumbers = true
polymorphic(Project::class) {
useSerialPolymorphicNumbers = true
subclass(OwnedProject::class, serialPolymorphicNumber = 1)
defaultDeserializer { BasicProject.serializer() }
}
}
Other uncompleted tasks
There are some KDocs I left empty with TODO
since the APIs are not finalized yet.
The mardown docs in Kotlin Serialization Guide are also not updated yet.
There is a lot of added complexity and special casing in this code. The same outcome could be achieved with a custom implementation of a polymorphic serializer (that maps between numbers and serial names), that can be marked as serializer for the base type. There are then only a few additions needed from serialization itself:
- A nicer way to enumerate all subtypes/subserializers for a base type (this is possible now, but not quite elegant)
- Some way for the serialization plugin to enforce (or warn about) requirements on subtypes (such as the need to specify an annotation).
Note also that your numbers system is a challenge as the numbers need to be unique (possibly within the entire program) if they are to completely replace serial names (on types).
Alternative names for "serial polymorphic number"
Number is ambiguous here, I typically use UByte
.
@pdvrieze Thanks for the reply.
The same outcome could be achieved with a custom implementation of a polymorphic serializer (that maps between numbers and serial names), that can be marked as serializer for the base type.
This is indeed the approach I am currently adopting in my projects, where I need to specify two-way KClass
Int
conversions. However, I think it's more verbose and separates the discriminator numbers from the annotation configurations (like @ProtoNumber
). So I think supporting this with annotations still looks more elegant and provides a better separation of concerns.
- A nicer way to enumerate all subtypes/subserializers for a base type (this is possible now, but not quite elegant)
This looks good, but I think it doesn't help that much if such information is only available at runtime and such exhaustiveness checks are not available at compile time.
- Some way for the serialization plugin to enforce (or warn about) requirements on subtypes (such as the need to specify an annotation).
Yeah this looks like a general and elegant way.
Note also that your numbers system is a challenge as the numbers need to be unique (possibly within the entire program) if they are to completely replace serial names (on types).
In the current implementation, the numbers are only unique in the scope of one certain base class (which is necessary). That's why a baseClass
argument needs to be passed to @SerialPolymorphicNumber
.
If you mean in the scope of one base class, the number of a subclass should be different in different places, I think we can support it in the SerializersModule
DSL as mentioned in the "Some more possible unimplemented APIs" section above. As a similar example, @SerialName
is also unique in this way.
@recursive-rat4 Yeah perhaps "integer" is a better word. I still think using an Int
or a UInt
is a better idea here. Though it's unlikely that a parent class has more than 256 subclasses, using an Int
/UInt
covers such cases and is as fast as using a Byte
/UByte
because of our 32-bit/64-bit CPU architectures.