kotlinx.serialization icon indicating copy to clipboard operation
kotlinx.serialization copied to clipboard

Support nested polymorphic hierarchies with different discriminators

Open TheKvist opened this issue 9 months ago • 3 comments

TL;DR:

I want to build a library to deserialize JSON like this:

[
  {
    "cmsType": "person",
    "firstName": "John",
    "lastName": "Doe"
  },
  {
    "cmsType": "product",
    "name": "Some product",
    "price": 100
  },
  {
    "cmsType": "link",
    "linkType": "internal",
    "reference": "some-reference"
  },
  {
    "cmsType": "link",
    "linkType": "external",
    "url": "https://example.com"
  }
]

into a hierarchy of classes somewhat like this:

@Serializable
@JsonClassDiscriminator("cmsType")
abstract class CmsObject

@Serializable
@SerialName("person")
class Person(val firstName: String, val lastName: String) : CmsBase()

@Serializable
@SerialName("product")
class Product(val name: String, val price: Int) : CmsBase()

@Serializable
@SerialName("link")
@JsonClassDiscriminator("linkType")
abstract class Link : CmsBase()

@Serializable
@SerialName("internal")
class InternalLink(val reference: String) : Link()

@Serializable
@SerialName("external")
class ExternalLink(val url: String) : Link()

where everything except CmsObject is user-defined and registered at runtime.


I'm building a library that allows users to deserialize JSON returned from a headless CMS. The CMS supplies a class discriminator for every type returned in its API, but the content types are user-defined and unknown to the library. They may also be polymorphic through a different discriminator property. They could also contain arrays of any combination of CMS types, where I'm hitting the limits of what seems doable with kotlinx.serialization.

To illustrate, here's an abstract representation of some basic JSON that the CMS might return and that library should be able to deserialize.

[
  {
    "cmsType": "person",
    "firstName": "John",
    "lastName": "Doe"
  },
  {
    "cmsType": "product",
    "name": "Some product",
    "price": 100
  }
]

The library only supplies a base class, setting up the cmsType.

@Serializable
@JsonClassDiscriminator("cmsType")
abstract class CmsObject

Users then extend this base class with their types and register those types with the library, which works for direct subclasses of CmsObject.

@Serializable
@SerialName("person")
class Person(val firstName: String, val lastName: String) : CmsBase()

@Serializable
@SerialName("product")
class Product(val name: String, val price: Int) : CmsBase()

Since the hierarchy can't be sealed, the library will tie everything together with a SerializersModule.

serializersModule = SerializersModule {
    polymorphic(CmsBase::class) {
        subclass(Person::class)
        subclass(Product::class)
    }
}

This works fine for the simple case. The first problem arises when the user introduces a new polymorphic hierarchy with its own class discriminator. For example, the CMS might return JSON like this:

[
  {
    "cmsType": "link",
    "linkType": "internal",
    "reference": "some-reference"
  },
  {
    "cmsType": "link",
    "linkType": "external",
    "url": "https://example.com"
  }
]

In order to represent this correctly, the user would need to define a new base class for the link type and subclasses for the internal and external types. In order to discriminate between the subclasses based on the linkType property, the user would have to use the @JsonClassDiscriminator annotation.

@Serializable
@SerialName("link")
@JsonClassDiscriminator("linkType")
abstract class Link : CmsBase()

@Serializable
@SerialName("internal")
class InternalLink(val reference: String) : Link()

@Serializable
@SerialName("external")
class ExternalLink(val url: String) : Link()

And here's where I'm hitting the first wall: The @JsonClassDiscriminator annotation can not be used to "add" another discriminator to the hierarchy.

One way to get around that could be to have the user define and supply a JsonContentPolymorphicSerializer for their Link class where they do the discrimination manually, but I would rather not force that on them.

Also, this doesn't save me from the fact that the above class hierarchy can not be registered in the SerializersModule, because it would have to look somewhat like this:

serializersModule = SerializersModule {
    polymorphic(CmsBase::class) {
        subclass(Person::class)
        subclass(Product::class)
        subclass(Link::class) // This is not possible
    }
    polymorphic(Link::class) {
        subclass(InternalLink::class)
        subclass(ExternalLink::class)
    }
}

Here, we're registering Link as a subclass of CmsBase so that it's serializer is selected for the link type, and then we're registering InternalLink and ExternalLink as subclasses of Link, but because Link is an abstract class, it can not be registered as a subclass of CmsBase in the SerializersModule.

This is total roadblock for me. The only way I can think of to get around this is to use a custom serializer for the entire CmsObject hierarchy and do the discrimination manually, but before I go down that rabbit hole, I wanted to ask for ideas and suggestions on how to solve this problem.

TheKvist avatar Feb 24 '25 10:02 TheKvist

@sandwwraith Apologies for pinging you directly, but if you have a spare moment, could you quickly check whether my use case is in fact unsupported by kotlinx.serialization or whether there is some way of pulling this off?

We need the desired serialization behavior for our app and would resort to implementing all necessary handling by ourselves if there is no solution, but we want to be sure that's the only option before we go down that road.

TheKvist avatar Apr 08 '25 07:04 TheKvist

No, I don't think we support double class discriminators out of the box. The only way I think it can be implemented is to write JsonContentPolymorphicSerializer for one of the root classes. It should be fairly simple, though:

class CmsContentPolySerializer: JsonContentPolymorphicSerializer(CmsBase::class) {
  override fun selectDeserializer(element: JsonElement) = when {
     element.jsonObject["cmsType"] == "link" -> Link.serializer() // returns PolymorphicSerializer for Link
     ...
  }
}

// The rest will be working automatically because Link.serializer() is polymorphic:

@Serializable
@SerialName("link")
@JsonClassDiscriminator("linkType")
abstract class Link : CmsBase()

@Serializable
@SerialName("internal")
class InternalLink(val reference: String) : Link()

@Serializable
@SerialName("external")
class ExternalLink(val url: String) : Link()

serializersModule = SerializersModule {
    polymorphic(Link::class) {
        subclass(InternalLink::class)
        subclass(ExternalLink::class)
    }
}

sandwwraith avatar Apr 08 '25 10:04 sandwwraith

Alright, understood. I have tried solving this using JsonContentPolymorphicSerializer before, but I didn't consider using it for the base class with a polymorphic SerializersModule for the subclasses. I only thought about the other way around, i.e., a SerializersModule for the base class and a custom serializer for the subclasses or a serializer-only approach, which I imagine would have worked but I feel like this would have been quite complex. This combination works out quite nicely, though! 👍

Now, there is only one problem left to resolve. I was originally planning to address this in a separate issue once this is resolved, but I've managed to find two solutions using your setup and now I'm only unsure which is the actual correct one.

The hierarchy presented works just fine, but in reality, the CMS is able to resolve references like the one in InternalLink, meaning that instead of this

@Serializable
@SerialName("internal")
class InternalLink(val reference: String): Link()

an internal link looks more like this:

@Serializable
@SerialName("internal")
class InternalLink(val reference: CmsBase): Link()

One solution I found for this is registering an instance of the CmsContentPolySerializer with contextual in the SerializersModule and then requiring any subclasses referencing CmsBase to annotate the reference property with @Contextual, i.e.

serializersModule = SerializersModule {
    contextual(CmsContentPolySerializer())

    polymorphic(Link::class) {
        subclass(InternalLink::class)
        subclass(ExternalLink::class)
    }
}

@Serializable
@SerialName("internal")
class InternalLink(@Contextual val reference: CmsBase): Link()

fun deserialize(content: String): List<CmsBase> {
    val serializer = ListSerializer(CmsContentPolySerializer())
    return json.decodeFromString(serializer, content)
}

The other solution explicitly registers CmsContentPolySerializer as the default polymorphic serializer for CmsBase:

serializersModule = SerializersModule {
    polymorphicDefaultDeserializer(CmsBase::class) { CmsContentPolySerializer() }

    polymorphic(Link::class) {
        subclass(InternalLink::class)
        subclass(ExternalLink::class)
    }
}

Just looking at it, the second one seems to be the "better" option since it doesn't require the @Contextual annotation and doesn't need to instantiate an explicit serializer for decodeFromString. Then again, I don't seem to have understood in which cases contextual serialization would be useful, so I'm not quite sure if this might not be one of them and if using it would be more appropriate.

Thanks for taking the time to look into this, I appreciate it a lot!

TheKvist avatar Apr 08 '25 16:04 TheKvist