kotlinx.serialization icon indicating copy to clipboard operation
kotlinx.serialization copied to clipboard

Inconsistent behaviour with sealed classes serialization

Open chipnesh opened this issue 5 years ago • 19 comments

I faced a problem with sealed classes serialization.

To Reproduce

import kotlinx.serialization.Serializable
import kotlinx.serialization.encodeToString
import kotlinx.serialization.json.Json

@Serializable
sealed class Project {
	abstract val name: String

	@Serializable
	class OwnedProject(override val name: String, val owner: String) : Project()
}

fun main() {
	println(Json.encodeToString(Project.OwnedProject("kotlinx.coroutines", "kotlin")))
	// prints: {"name":"kotlinx.coroutines","owner":"kotlin"}
	val project: Project = Project.OwnedProject("kotlinx.coroutines", "kotlin")
	// prints: {"type":"com.example.Project.OwnedProject","name":"kotlinx.coroutines","owner":"kotlin"}
	println(Json.encodeToString(project))
}

I thought that result in both cases should be equal, but it prints different JSON strings.

Environment

  • Kotlin version: 1.4.10
  • Library version : Kotlinx Serialization Plugin 1.4.10
  • Kotlin platforms: JVM
  • Gradle version: 6.6.1
  • Other relevant context: JRE version: 1.8

chipnesh avatar Nov 11 '20 15:11 chipnesh

In the first case, the inline encodeToString will receive an object type parameter of type OwnedProject and therefore "knows" the type and doesn't need to do any polymorphic handling. In the second case the static type (passed as reified parameter) is Project triggering polymorphic serialization (and the type tag). Note that serialization of sealed classes works almost the same as regular polymorphic serialization. The difference is that for sealed classes the descriptor will be able to determine all subclasses from the sealed hierarchy and for non-sealed the subclasses must be declared manually in the SerialModule.

pdvrieze avatar Nov 12 '20 10:11 pdvrieze

Is there any way to enable a discriminator field to all class hierarchy forcefully? I mean like JsonTypeInfo annotation for Jackson mapper or something? If I remember correctly Jackson includes a discriminator field in both cases.

My case is related to response serialization in "ktor" route. Because of this problem, it looks like this:

	post("/login") {
		val request = call.receive<LoginCredentials>()
		val response: TokenResponse = authService.login(request)
		call.respond(Json.encodeToString(response))
	}

I have to do manual encoding through encodeToString and have to specify the "needed" response type explicitly.

chipnesh avatar Nov 16 '20 05:11 chipnesh

You can pass the base class type explicitly in as the type argument. Alternatively, you store the value in a variable with the type of the sealed base.

pdvrieze avatar Nov 19 '20 16:11 pdvrieze

You can also use Json.encodeToString<Project>(Project.OwnedProject("kotlinx.coroutines", "kotlin"))

sandwwraith avatar Nov 19 '20 17:11 sandwwraith

and therefore "knows" the type and doesn't need to do any polymorphic handling

@pdvrieze just because we know the concrete type on serializer side at this time doesn't mean the deserializer will know what to do, right?

You can pass the base class type explicitly in as the type argument. Alternatively, you store the value in a variable with the type of the sealed base.

You can also use Json.encodeToString<Project>(Project.OwnedProject("kotlinx.coroutines", "kotlin"))

The workarounds mentioned here are not possible for me as I don't control the place where Kotlinx.Serialization is called. I'm using Spring support for kotlinx serialization, and internally Spring uses serializer<T>() to find the serializer for the object I give to it.

The interface I have with SimpMessagingTemplate is a generic one (independent from the message converters used) taking just the destination topic, and the payload:

simpMessagingTemplate.convertAndSend("/topic/games", GameListEvent.CreateOrUpdate(game))

The problem is that this means it always has the concrete type, so it never finds it "necessary" to include the type information.

But that is when Spring sends messages over the STOMP websocket. Now when I receive them in the topic I use, I expect any subclass of my sealed class, so the deserializer doesn't know what to use and fails. I think it would be necessary to have a parameter to include the discriminator all the time.

One workaround that would work is actually wrapping the sealed class into a concrete type with a single field, because the type of the field would be the sealed class parent and solve this problem. But that's pretty inconvenient to use. It would be much nicer to have a global switch to enable consistent serialization of polymorphic classes.

joffrey-bion avatar Dec 10 '20 09:12 joffrey-bion

@joffrey-bion Where you are confused is not about what the serializer knows, but the serializer used. In this case Json is what we would call the format, when invoking the format it with a serializer parameter (if you use the parameterless versions, you this happens behind the scenes with the inline reified function or dynamically), it will create a corresponding encoder and then asks the serializer to write itself to the encoder. As such a serializer is specific to each type but also links to subserializer. As such the sealed parent type has a serializer and the child type has a serializer. The parent serializer will write out the type discriminator and then delegate writing the actual data to the serializer for the specific type. The child serializer would be (presumably) a leaf serializer.

The workarounds mentioned here are not possible for me as I don't control the place where Kotlinx.Serialization is called. I'm using Spring support for kotlinx serialization, and internally Spring uses serializer<T>() to find the serializer for the object I give to it.

If spring uses dynamic type to determine the serializer it is probably broken. Especially if that doesn't map to the same dynamic type on deserialization (which it can't). The valid type to use by spring would be the actual static return type of the function (or other static type of data to be serialized). You may want to file a bug against spring to handle polymorphic and sealed class serialization in a smarter way (maybe walking the class hierarchy).

pdvrieze avatar Dec 10 '20 12:12 pdvrieze

is not about what the serializer knows, but the serializer used.

Yes I'm aware of that, sorry if I was unclear. I used the term "serializer" in the broad sense of "the program which performs the serialization". And in this case my point is that it's not always possible to force the use of the parent serializer (because, yes, spring uses the concrete dynamic type to lookup the serializer to use, and thus ends up using the concrete type's serializer).

If spring uses dynamic type to determine the serializer it is probably broken. Especially if that doesn't map to the same dynamic type on deserialization (which it can't).

I understand it's usually better to do things the Kolinx Serialization way and rely on static typing, especially when we use generics etc.

In this specific case though, I don't believe Spring should necessarily change their behaviour. They can't change their current common API to include kotlinx-serialization-specific serializer parameter. And I believe it would be quite inefficient for them to reflectively walk the class hierarchy, especially knowing that we already have a compiler plugin which may be able to figure everything out in the first place (see below).

The valid type to use by spring would be the actual static return type of the function (or other static type of data to be serialized).

There is no static type information at that point, because it's a message broker, and the messages are quite generic. It's not a simple route handler in a controller here, in which case it could indeed have been possible to use the static return type.

Also, it's not even a problem specific to Spring. There could be a lot of places in the code where we use json.encodeToString(SubType()) naively. We shouldn't have to think about using json.encodeToString<Parent>(SubType()), by foreseeing who is going to read this JSON result. Depending on who reads the value, we may or may not have a problem, that's why I think the serialization of the child objects should be consistent, regardless of the serializer used.

Also, intuitively I would expect that using the child serializer is always more safe because it has more information than the parent, but that's where I'm mistaken here.

The parent serializer will write out the type discriminator and then delegate writing the actual data to the serializer for the specific type.

Is it possible to make Kotlinx Serialization behave differently here? Since all these serializers for different types are generated at compile time, couldn't the type discriminator encoding be done in the child serializer? This would ensure consistency between the serialization of these instances, being by the parent serializer of by the child serializer, and it could be an opt-in mechanism of course (see https://github.com/Kotlin/kotlinx.serialization/issues/1247)

joffrey-bion avatar Dec 10 '20 12:12 joffrey-bion

It is not too hard to create an alternative function that always wraps a serializer for a value that could be dynamic into a PolymorphicSerializer (or just finds the appropriate parent). In the case of automatic serialization by a framework (like Spring) the framework could do this (or just look at the static return type of the function).

A final solution would be to create your own format (using the json one) that will serialize a type descriminator.

pdvrieze avatar Jan 12 '21 11:01 pdvrieze

It is not too hard to create an alternative function that always wraps a serializer for a value that could be dynamic into a PolymorphicSerializer (or just finds the appropriate parent). In the case of automatic serialization by a framework (like Spring) the framework could do this

I'm not sure I understand what you mean here. How do you know which values could be polymorphic subtypes? We would basically need a reliable way to find the serializer of the farthest serializable polymorphic parent.

Which means:

  • option 1: register a mapping from each polymorphic subtype's serializer to its parent's polymorphic serializer (which is almost as cumbersome as using a wrapper everytime we use a polymorphic value, because it requires manual work for every new polymorphic type)
  • option 2: reflectively walk up the class hierarchy (for every serialized type) and stop just before the first ancestor that is not @Serializable. This is slow in general, but using a cache here would make it like an automated option 1, albeit potentially using way more space (because storing all the types), and maybe preventing class unloading, but I'm not too well versed in this.

(or just look at the static return type of the function).

As I already mentioned, there is no static type here. This can be anywhere in the code:

simpMessagingTemplate.convertAndSend("/topic/games", GameListEvent.CreateOrUpdate(game))

A final solution would be to create your own format (using the json one) that will serialize a type descriminator.

How would that custom format know about polymorphic parent types of the current value? I haven't played with them. The SerializationStrategy passed to the format would just be that of the child in this case, right? So aren't we back to square one here?

Or do you mean the format would encode a discriminator on its own for all types, just based on the dynamic value's type, ignoring the polymorphic VS non-polymorphic distinction? That could be a solution but it clutters the serialized data needlessly for most of the other types.

--

I'm not sure why we should try so hard to workaround this problem with reflection or custom code. I thought the point of this library was to avoid these reflective hacks and rely on the type system to correctly generate serializers at compile time. It seems (conceptually) simple to make the compiler plugin support adding the discriminator to child serializers by using its knowledge of the polymorphic hierarchy when generating them. Could you please explain why this would be a bad solution? It seems to me that it would be the right thing to do here.

joffrey-bion avatar Jan 12 '21 12:01 joffrey-bion

For code like:

simpMessagingTemplate.convertAndSend("/topic/games", GameListEvent.CreateOrUpdate(game))

There is a static type to the second parameter value. It would be the return type of GameListEvent.CreateOrUpdate. If you want this to work the "correct way" you have two versions of convertAndSend:

class Template {
// ...
    fun <T> convertAndSend(path: String, serializer: SerializationStrategy<T>, value: T) {
      // some implementation
    }

   inline fun <reified T> convertAndSend(path: String, value: T) {
    convertAndSend(path, serializer<T>(), value)
   }
}

When you do this, the type used will be the static type, and you shouldn't really need to override the type most of the time (you can with the type parameter).

One note, it does require that there is a serializer for the specific type.

If you want to go the dynamic route, the "easiest" way would be to have the implementation of convertAndSend first check whether the serializer is already polymorhpic/sealed. If not, you can use a custom serializer that meets the api for polymorphic serializers and forces generating the type discriminator.

pdvrieze avatar Jan 12 '21 13:01 pdvrieze

@pdvrieze thanks for this. Yes of course if I controlled this API I would change it this way (I actually designed my whole Krossbow library this way to support both reflective serialization libraries AND Kotlinx Serialization natively).

However, in this current case I'm using Spring (which owns simpMessagingTemplate). I'm assuming frameworks like this can't just change their public generic API like this just to support the specific Kotlinx Serialization library better. It would be bad design for them to expose implementation-specific types like SerializationStrategy. In my own library I remember it was quite a pain to provide another API specifically for Kotlinx Serialization in addition to the generic one, as I had to redesign part of the core interfaces in addition to creating a specific dedicated module for Kotlinx Serialization.

And this wouldn't actually be necessary if child serializers of polymorphic types could support the discriminator in Kotlinx Serialization. I'm failing to see arguments against making this library configurable in this respect (as proposed here). Could you please explain if it is a conceptual problem? Or is it just that it would be hard to implement and may not be worth the effort in your opinion?

joffrey-bion avatar Jan 12 '21 14:01 joffrey-bion

First of all, how discriminators are implemented is a matter for the format, so this would be something that would be specific to each format. Having said that, the way it would be cleanly implemented is by treating the outermost value/serializer as if it were a polymorphic serializer with a specific type (Any?). This can be done with extension functions fairly easily (extend Any, add a mapping from Any to the actual type to the serial module). Such extension function would not even be format specific. Your Spring adapter for serialization would then use this extension function rather than the default. PolymorphicSerializer is even public (with the base type as constructor argument) so it would be very easy to do.

pdvrieze avatar Jan 12 '21 16:01 pdvrieze

First of all, how discriminators are implemented is a matter for the format, so this would be something that would be specific to each format.

I understand that how discriminators are represented in the serialized form is specific to each format. But a good deal of the decision for the inclusion (or not) of a discriminator seems to be based on the type of serializer that we pass to the format (at least that's what I understood from the quick look I had inside the JSON format implementation, please forgive my ignorance about it and correct me if I'm wrong here).

So the same format (Json) has different behaviours (with respect to discriminators) depending on the format-agnostic SerializationStrategy that we pass to it. Hence my initial feeling that the serializers themselves needed to be changed so that we could control this behaviour, but maybe this is wrong.

We can come up with any strategy to dynamically find the polymorphic serializer we want, but it always feels like a workaround for lacking this info in the child serializer.

Having said that, the way it would be cleanly implemented is by treating the outermost value/serializer as if it were a polymorphic serializer with a specific type (Any?). This can be done with extension functions fairly easily (extend Any, add a mapping from Any to the actual type to the serial module).

I've trouble understanding what you're talking about here. By "extend Any", do you mean subclassing or creating an extension function?

  • If subclassing, then does that mean all my types now need to extend that custom Any subclass? And if we're talking about subclassing, then what is this extension function you're talking about before and after this sentence?
  • If extension function, what would be the signature/role of this extension? Are we talking about .serializer()?

add a mapping from Any to the actual type to the serial module

What is the "actual type" here? If we're designing a generic way of getting polymorphic serializers on any type, we don't have any particular actual type here, right? Or are you talking about the subclass of Any that we would create in the previous step?

joffrey-bion avatar Jan 12 '21 18:01 joffrey-bion

Before I answer your questions below you want to keep in mind that kotlin serialization is primarily based upon statically resolving serialization, even though it can work polymorphically on some platforms. Spring's approach is inherently based upon polymorphic serialization (only caring about the runtime type of the value serialized). What I'm describing is the cleanest way to bridge this gap by making sure that the spring plugin you use to provide serialization with kotlinx.serialization does this correctly, making the outer value polymorphic.


I understand that how discriminators are represented in the serialized form is specific to each format. But a good deal of the decision for the inclusion (or not) of a discriminator seems to be based on the type of serializer that we pass to the format (at least that's what I understood from the quick look I had inside the JSON format implementation, please forgive my ignorance about it and correct me if I'm wrong here).

The way discriminators "appear" is through (either sealed or polymorphic, only discussing polymorphic here, sealed is mostly the same) the PolymorphicSerializer. In the descriptor of this serializer the "kind" is polymorphic, so a format could, if it wished so, special case this (my XML format does this when configured). The polymorphic serializer creates a struct with two members: 1 the serialname of the serializer of the concrete runtime value. The second member is the data element which is the serialization of the value using the serializer for the runtime value.

For deserialization the type polymorphic serializer first reads the type value (the serialname of the serializer to use), then looks up the deserializer from the serialmodule. Then this is used to read the data element using this deserializer (of the subtype) and returned from the polymorphicserializer instance.

So the same format (Json) has different behaviours (with respect to discriminators) depending on the format-agnostic SerializationStrategy that we pass to it. Hence my initial feeling that the serializers themselves needed to be changed so that we could control this behaviour, but maybe this is wrong.

The way to think about this is that the SerializationStrategy (Aka Serializer) defines an abstract tree structure of data to be written (where nodes have names, types and other metadata | and child links also have names an annotations). The format translates this abstract tree into a concrete representation according to the rules of the format (Json, XML,...). In that translation, the format is able to, to some degree, to inspect the structure, and types and special case things. But in this structure, a polymorphic serializer is not the same as a static type dependent serializer.

We can come up with any strategy to dynamically find the polymorphic serializer we want, but it always feels like a workaround for lacking this info in the child serializer.

From the design of serialization, child serializers need to be able to be used in the context where the type is known, without carrying significant context information around to parameterize them. In addition, the child serializer needs to be able to be used independently on the data member even in the discriminated case.

Important to keep into account here is that the framework must know the type to use before the child can be deserialized. This is trivial for binary formats (that don't use lengths all over the place), but even for text formats it requires buffering the data (somehow knowing the end of it), determining the type, and then reading the actual value from the then resolved deserializer.

I've trouble understanding what you're talking about here. By "extend Any", do you mean subclassing or creating an extension function?

What I mean is that in the serialization system a polymorphic serializer is defined against the common base type (the type that all values will be able to be cast as). I assume that you don't have a specific candidate for the function, so you would use Any instead.

  • If subclassing, then does that mean all my types now need to extend that custom Any subclass? And if we're talking about subclassing, then what is this extension function you're talking about before and after this sentence?

Any is the Kotlin equivalent of Object, all types inherit Any per the rules of the language, no need to do anything for it.

  • If extension function, what would be the signature/role of this extension? Are we talking about .serializer()?

What I mean is that in the function that plugs into swing to do the serialization you do the following (given a value you want to serialize):

  fun serializeToString(value: Any): String {
    val s = PolymorphicSerializer(Any::class)
    val m = SerializersModule {
      polymorphic(Any::class) {
        subclass(value.javaClass.kotlin)
      }
    }
    val j = Json { serializersModule = m }
    return j.stringify(s, value)
  }

You can also use the different serialization methods than stringify (like writing to an OutputStream)

add a mapping from Any to the actual type to the serial module

What is the "actual type" here? If we're designing a generic way of getting polymorphic serializers on any type, we don't have any particular actual type here, right? Or are you talking about the subclass of Any that we would create in the previous step?

The actual type here is the runtime type of the value you are serializing. For (de)serialization to be possible, the type must be registered in the module for that to work (randomly looking up types doesn't work without reflection, and is always a security issue), PolymorphicSerializer uses the module even for serialization.

pdvrieze avatar Jan 13 '21 15:01 pdvrieze

@pdvrieze Thank you very much for the amount of details and explanations you put in your message. I appreciate the time you spent here.

keep in mind that kotlin serialization is primarily based upon statically resolving serialization, even though it can work polymorphically on some platforms

I totally get that, that's why I made the necessary changes in my personal library to respect this approach and use static types or serializers all the way (especially because my lib is multiplatform). I just feel like there is something that can be done to help the reflection-based JVM frameworks like Spring.

The way to think about this is that the SerializationStrategy (Aka Serializer) defines an abstract tree structure of data to be written

This is why I was mentioning that maybe the child serializers should be the ones providing the extra data. There may be a way to enrich the child serializers (at compile time) with information about their parent's serializer to make them optionally behave like a polymorphic serializer.

From the design of serialization, child serializers need to be able to be used in the context where the type is known, without carrying significant context information around to parameterize them.

This is why it should be opt-in if it were supported (either by configuring the compiler plugin, or by providing different ways of calling them at runtime). We could for instance imagine that the child serializer can be asked to return a polymorphic-like tree OR the regular data tree. When an instance of this child type is serialized as a top-level object, the format could decide that it needs extra information (depending on its own config about polymorphism) and thus call the serializer accordingly.

Important to keep into account here is that the framework must know the type to use before the child can be deserialized

I'm not sure this is relevant here, because this is already covered by parent deserializers. My use case here is that a child serializer is used to serialize a child value, and a parent deserializer is used to deserialize that at the other end.

If subclassing, then does that mean all my types now need to extend that custom Any subclass? And if we're talking about subclassing, then what is this extension function you're talking about before and after this sentence?

Any is the Kotlin equivalent of Object, all types inherit Any per the rules of the language, no need to do anything for it.

Yes, but in this paragraph I assumed that by "extend Any" you really meant creating a subclass of Any. So in that case this arbitrary class would not be Any itself, and would need to be extended by every serializable instance in the user application. I understand now that you didn't mean "subclass" when you wrote "extend Any", so we can dismiss this paragraph.

Thanks a lot for the rest of the explanations, so there is no extension function after all. Things are much clearer now. About your proposed solution, is it allowed to map Any like you did as polymorphic parent? Don't we need something that's annotated @Serializable?

Also, this solution would add a discriminator on every single type, which could greatly increase the payload and is most likely not desirable. What I'm looking for is a behaviour that just makes the serialization of polymorphic subclasses consistent. Wherever they are used (even in places where the static type is known), polymorphic subclasses would be serialized with discriminator. However, regular @Serializable classes without serializable parent should not suffer from the extra payload.

This is why I'm not sure there is a way with the current implementation of Kotlinx Serialization.

joffrey-bion avatar Jan 13 '21 16:01 joffrey-bion

About your proposed solution, is it allowed to map Any like you did as polymorphic parent? Don't we need something that's annotated @Serializable?

Serializable is actually only needed for generating the serializer of the child. Any is a special case and any serializable inherits it (but the generator knows it as a special type). As such it works. As you are only writing using Any is fine (the static type of the value once it gets to your serialization code is Any).

If you want to only have a discriminator for polymorphic types there are two options.

  • You could detect it (maybe cache the info) by walking the tree - this is always going to be best effort and may write more discriminators than needed. This solution is transparent from the user.
  • the second option is to somehow pass parent type through the spring innards. The way to do this is with a helper type that contains both the value and parent type. You can use an inline function with reified type parameter to create this and then forward to the spring output code. This requires the client to use the special function.

Note that the options are not exclusive as the serialization code could detect the wrapper type and handle it appropriately, but use the first approach otherwise.

The challenge is that by passing values into spring as a parameter (rather than having a method be wrapped like jaxws does) you loose the static type of the value passed in for the parameter. The reified type parameter captures this type and can then use it as desired. You can also specify it explicitly to the "correct" type.

Ps. serializers for a parent type are not actually used/inherited by serializers for child types. The code generator does need the parent to be serializable though.

pdvrieze avatar Jan 14 '21 19:01 pdvrieze

the second option is to somehow pass parent type through the spring innards. The way to do this is with a helper type that contains both the value and parent type

That's actually a great option. We could basically define an extension function on simpMessagingTemplate to provide the equivalent of json.encodeToString<Parent>(childValue) (which behind the scenes calls the regular Spring function with a wrapped value that includes information about the reified parent type, from which we can get the serializer).

The only thing would be that either Spring needs to support this kind of "informative wrapper" in their default KotlinSerializationJsonHttpMessageConverter, or I need to implement my own.

joffrey-bion avatar Jan 14 '21 22:01 joffrey-bion

@joffrey-bion I had been assuming that you were using your own converter. I see that there is an implementation in Spring. Looking at that code: https://github.com/spring-projects/spring-framework/blob/2d53570f4cc148250378b852e671fcb0369f754b/spring-web/src/main/java/org/springframework/http/converter/json/KotlinSerializationJsonHttpMessageConverter.java#L144-L149

it is clear that it does not have an existing mechanism to handle polymorphic types (the code determining the serializer even throws an error). You would have to create your own version (or create a patch/pull request for Spring). The function does have a type parameter in writeInternal (very often null according to references in github), but it is not generally specified. I would say that it would be worthwhile to update the class.

pdvrieze avatar Jan 15 '21 14:01 pdvrieze

bin JETZT auf achse deswegen

flerbuster avatar Mar 31 '24 14:03 flerbuster