Support Json-like polymorphism in Cbor
~~This is more a question than a feature request...~~
I'm working in an environment where JSON messages are being published from an Android App, written in Kotlin, via MQTT, to a Python-based backend where these messages are being decoded and processed. Serialization in the Android App is done with kotlinx.serialization, of course... :wink:
The messages are being serialized from a wrapper class which is implemented as follows:
@Serializable
class ExternalMessage(val timestamp: Long = System.currentTimeMillis(), val msg: ExternalMessageBase)
The 'real' message content is store in the msg property of this wrapper and derived from the ExternalMessageBase class:
@Serializable
sealed class ExternalMessageBase(@Transient var messageDirection: Direction = Direction.OUT)
Now I do have one specific message type, which contains image data, that I don't want to encode in JSON but in CBOR to keep the message size minimal and to get rid of encoding the image data to a base64 string in the App. The class for this message is implemented as follows:
@Serializable
class ImageDataMessage @OptIn(ExperimentalSerializationApi::class) constructor(
@SerialName("image_data") @Contextual @ByteString val imageData: Mat,
@SerialName("camera_position") val cameraPosition: CameraPositionData,
@SerialName("pixel_per_meter") val pixelPerMeter: Float = 0.0f
) : ExternalMessageBase()
The encoding of this message in JSON results in a slightly different structure than encoding in CBOR (this output has been generated using json.loads/cbor2.loads on the Python side):
(JSON)
{u'msg': {u'type': u'ImageDataMessage', u'image_data': u'', u'pixel_per_meter': 2221.0, u'camera_position': {u'position': {u'y': 0.0, u'x': 0.0, u'z': 0.0}, u'orientation': {u'y': 0.0, u'x': 0.0, u'z': 0.0, u'w': 0.0}}}, u'timestamp': 1665569404785}
(CBOR)
{u'msg': [u'ImageDataMessage', {u'image_data': '', u'pixel_per_meter': 2221.0, u'camera_position': {u'position': {u'y': 0.0, u'x': 0.0, u'z': 0.0}, u'orientation': {u'y': 0.0, u'x': 0.0, u'z': 0.0, u'w': 0.0}}}], u'timestamp': 1665567516213}
As one can see, in the JSON output, the ImageDataMessage is encoded into one map which also contains the type attribute whereas in CBOR the ImageDataMessage is encoded into a list which contains the type attribute and a map which contains the remainder of the ImageDataMessage object.
What I would like to achieve is that the result of the serialization for CBOR is the same as for JSON because that would prevent implementation of a big amount of changes to the processing logic in the Python-based backend. Ideally, I would just replace
import json
with
import cbor2
in my Python code and the processing logic works the same regardless of the format the message was encoded to.
Is this achievable somehow using kotlix.serialization e.g. by changing the CBOR configuration or the implementation of the Message classes?
Thanks in advance!
The difference is in the polymorphic strategy. By default, all polymorphic classes are encoded as in the CBOR: array of [type, object]. However, since this is a non-standard representation for Json, it has special support for polymorphism that is enabled by default with Json { useArrayPolymorphism = false } flag. Other formats don't usually support this flag. To achieve what you want, you either need to support special polymorphism format in CBOR or to turn off the aforementioned Json flag.
Also, note that CBOR does not really save much space, as keys are still encoded as strings in utf-8. Perhaps you just want to use a better serializer for ByteArray in Json.
Thanks a lot @sandwwraith for the explanation! Maybe I'll have a look at implementing this myself during the course of Hacktoberfest 👍
The feature itself is reasonable, but I would like to warn any of the potential contributors here -- it has a a lot of work in it, mostly because it requires format to be able to read/skip through an arbitrary number of elements of various nesting levels prior to finding type discriminator
Hello @qwwdfsad,
thanks for the hint! I took a look at the code and tried to implement something, but indeed this seems to be a bigger task. I'd like to keep trying to implement this, but any pointers on where to start or what to look for would be greatly appreciated!
@jsiebert You can take a look at StreamingJsonDecoder.decodeSerializableValue https://github.com/Kotlin/kotlinx.serialization/blob/0a1b6d856da3bc9e6a19f77ad66c1b241533a20b/formats/json/commonMain/src/kotlinx/serialization/json/internal/StreamingJsonDecoder.kt#L53
There are several takeaways:
- To intercept default array-based polymorphism, one need to check
if (deserializer is AbstractPolymorphicSerializer<*>)and then add custom behavior. - Cbor objects, as Json objects, are maps. And that means that the
typekey for polymorphism may be in arbitrary place inside this map. - Current json implementation optimistically checks the first key in the object, and checks if it is the
typekey to load serializer. If it is not there, it falls back to default behavior. This optimization is probably not necessary for CBOR. - To search for
typekey in arbitrary object in arbitrary place, one probably needs intermediate data structure. For Json, this isJsonElement— first object string is parsed toJsonElement, thentypekey extracted, then the rest ofJsonElementis parsed to an actual Kotlin object using separate JsonTreeDecoder (decoder is separate because its input is JsonElement, not String). - That's why this feature requires much work in first place — there's no analog for
JsonElementandJsonTreeDecoderin Cbor (yet). - Alternatively, it is probably possible just to store CBOR nested object bytes in some intermediate place and read them twice — first to find
typekey, then to deserialize to Kotlin object with actual deserializer. It will probably be much simpler, but this is for you to find out.
Hope this helps. Good luck!