kotlinx.serialization Provide a custom serialization context

What is your use-case and why do you need this feature?

I want to serialize an object graph with repeated object references. This graph (and any updates to its structure) will be sent in a Ktor/Websocket server session to the JS client. I need to avoid serializing an object more than once and I need to break possible cycles. The current idea involves:

Server side: Maintain a per-session set of those objects which have already been serialized. Object references will be serialized as objects only once by checking the set. Repeated references will be serialized as object IDs only.
Client side: Maintain a hash map to resolve references from object IDs.

So server-side serialization needs to access the session-specific set of serialized objects. Currently, the only solution seems to be using a ThreadLocal in combination with the asContextElement extension function which looks a bit too much like magic behind the scenes.

Describe the solution you'd like

An idea is to have a custom serialization context, e.g. via a property of SerializersModule. This context should be available to overrides of KSerializer<T>.[de]serialize(...). Currently, the module can be accessed via Encoder and Decoder, so the access path is already available.

In the above use case, the custom custom context would just be the set of serialized objects. It would be initialized in the Ktor session's coroutine. An open question is type safety when accessing the context.

Oct 15 '20 21:10 OliverO2

There are 2 feasible approaches that I see:

Make a custom format (quite complex)
(de)Serialize indirectly. Create a type that you use to serialize/deserialize that has the references embedded. In the function where you go from the transport type to the normal type you can do the reference resolution and the other way around.

Oct 17 '20 21:10 pdvrieze

I have now published a working graph serialization example: https://github.com/OliverO2/graph_serialization (choose the Gradle target allTests to execute).

A current limitation of the code is this session context reference in CrossSystemObject.kt#L56. It allows only one serialization session per process, which is insufficient on servers dealing with multiple clients.

If such a session context could be passed as part of a serialization module or directly as an optional parameter to the format's encodeTo.../decodeFrom... methods and then finally down to serialize/deserialize methods, that would be ideal.

Oct 19 '20 12:10 OliverO2

I just pushed two new commits for compatibility with Kotlin 1.4.20-M2 to https://github.com/OliverO2/graph_serialization.

Oct 19 '20 14:10 OliverO2

Thanks for suggestion. Currently, a set of objects in circular references can be stored inside of SerialFormat, so you indeed have to implement your own JSON for this to work seamlessly.

Although I'm not sure how SerializersModule-specific components would help in this case: for your scenario, you should clear this cache every time you deserialize new message; the module in SerialFormat, however, lives as long as the format itself.

If you can clear already deserialized circular references manually, then you can simply create a Map<SerializersModule, ...>. Since SerializersModule do not have any equals operator and are compared by identity, this map would be module-specific (and therefore, specific for format instance if you pass a new module for each new instance of format for session or for application thread).

Oct 19 '20 14:10 sandwwraith

@sandwwraith Thanks for your insights. It seems that neither the SerializersModule nor the format would be the best place to store serialization sets:

Their life-cycle could well be independent of server sessions.
Implementing a new format seems to require major effort (e.g. extra implementations of Encoder and Decoder without being able to use delegation due to most existing things being internal).

So a direct context parameter as described above seems to be ideal.

Currently I'm thinking about using a ConcurrentHashMap<WebSocketSession, SerializationSessionContext> on the JVM and a static SerializationSessionContext on JS. My small example project mentioned above shows how that context will be used. If a context parameter were available in the encodeTo.../decodeFrom... methods which would be passed down to serialize/deserialize methods, I could avoid these platform-specific implementation differences and save the extra map accesses.

Oct 19 '20 15:10 OliverO2

With the latest updates to https://github.com/OliverO2/graph_serialization there is now a working multi-session implementation. It is using ThreadLocal<T>.asContextElement() on the JVM to obtain the session context.

@sandwwraith I'll leave this issue open for now as I'd still feel an added context parameter would be useful. Feel free to close it if you are not considering anything like it at this time.

Oct 20 '20 11:10 OliverO2

With the latest updates to OliverO2/graph_serialization there is now a working multi-session implementation. It is using ThreadLocal<T>.asContextElement() on the JVM to obtain the session context.

@sandwwraith I'll leave this issue open for now as I'd still feel an added context parameter would be useful. Feel free to close it if you are not considering anything like it at this time.

+1 to request this feature that providing contextual user-provided information for use during serialization like userInfo field of JSONEncoder/JSONDecoder in Swift.

Mar 09 '21 06:03 qhhonx

Also bumped into this limitation during the migration of my library Decompose to kotlinx-serialization. It would be nice to have a something like:

operator fun <T> KSerializer<T>.plus(module: SerializersModule): KSerializer<T>

Oct 22 '23 18:10 arkivanov

Also bumped into this limitation during the migration of my library Decompose to kotlinx-serialization. It would be nice to have a something like:
operator fun <T> KSerializer<T>.plus(module: SerializersModule): KSerializer<T>

This is a somewhat different problem although I suspect you want the receiver to be the format specific encoder/decoder, not the serializer. This would allow custom serializers to inject knowledge of polymorphic children. It is not hard to support (xmlutil does so) by allowing the creation of a sub-format/sub-encoding based on the original with the same underlying serialization stream.

A format-agnostic way to support this would be good though (maybe as an interface that can be tested upon). However I think that should be a different bug.

This particular issue is more that you want to have some data attached to a single (de)serialization run that can be used by custom serializers to store/pick up state. As mentioned above threadlocals can work, but aren't ideal as there is not really a way to get any key for the "current serialization run". Instead somehow it would be needed to pass this extra information along to the serializer.

There are 2 solutions:

Add overloads for encode/decode with an extra parameter and provide defaults that use the old functions
Have the encoder/decoder hold this information that can be queried, possibly after an interface instance check.

Note that all of these solutions would need to be supported by the format (which also needs to have the ability to provide the context in the initial call), but defaults could have the formats that have not been updated just return Unit

Oct 23 '23 12:10 pdvrieze

Thanks for the explanation! I will file a separate issue.

Oct 24 '23 08:10 arkivanov