kotlinx.serialization icon indicating copy to clipboard operation
kotlinx.serialization copied to clipboard

Serializer to get the raw json value of a key?

Open tmm1 opened this issue 5 years ago • 14 comments

What is your use-case and why do you need this feature?

I am looking for a way to deserialize a specific field in my json into a raw byte array representing the value json sub-document.

Basically my json documents have large/complex json sub-trees that I would like to avoid parsing to save cpu/allocations. But I still need the value so I can re-create the original json if needed.

In golang, for example, this can be achieved with json.RawMessage: https://golang.org/pkg/encoding/json/#RawMessage

In gson, a type adapter can be used to regenerate json during parsing. This is not particularly cpu/gc efficient, but it works: https://github.com/google/gson/issues/1368

In moshi, there is work being done to be able to skip over the value and consume it into a raw value field: https://github.com/square/moshi/issues/675

Describe the solution you'd like

I'm not familiar enough with the kotlin.serialization APIs to know if there is already a way to do this, or if it can be implemented within a custom serializer. Any pointers would be appreciated!

tmm1 avatar Sep 09 '20 05:09 tmm1

Please, check out Json Elements: https://github.com/Kotlin/kotlinx.serialization/blob/master/docs/json.md#json-elements Does it do what you are looking at?

elizarov avatar Sep 09 '20 07:09 elizarov

If I understood correctly, you want to save some part of JSON to a String (RawJson) property so it would be parsed later? We do not currently support this concept. JsonElement is an untyped version that does not do mapping on classes, although it still performs parsing to check that your JSON is valid

sandwwraith avatar Sep 09 '20 14:09 sandwwraith

If I understood correctly, you want to save some part of JSON to a String (RawJson) property so it would be parsed later?

Yes, exactly. I want to defer parsing for parts of the document.

JsonElement is not a good fit because it still parses and creates objects. So the cpu/memory benefits of lazy parsing are lost.

How does kotlin.serialization handle ignores unknown keys when deserializing into an object? Are the keys skipped during or after parsing? (I'm wondering what the cpu/allocation overhead is in cases where keys are ultimately ignored)

tmm1 avatar Sep 09 '20 15:09 tmm1

The unknown keys are skipped without parsing (tokenizing only). However, the skipped string is not saved anywhere, so it requires some additional amount of work to support such a feature

sandwwraith avatar Sep 10 '20 10:09 sandwwraith

The feature seems like a reasonable addition, tho it still has some open questions.

Are the keys skipped during or after parsing?

Could you please elaborate on your use-case here? Because "put all unknown keys in a separate String property with valid JSON string" and "Treat specifically marked property not as simple String, but as a valid JSON encoded in String" are completely different approaches.

JsonElement is not a good fit because it still parses and creates objects. So the cpu/memory benefits of lazy parsing are lost.

I wonder if there exist benchmarks (or maybe you have a relevant story to add?) to ensure that the performance boost is significant here. Because even without allocations of JsonElement, parser still has to 1) parse the JSON and extract the relevant sub-object 2) ensure that the whole sub-object is a valid JSON. And the second part is probably the slowest in the whole JSON decoding process, so I'm really interested in knowing how big is the performance improvement here.

qwwdfsad avatar Sep 13 '20 13:09 qwwdfsad

"Treat specifically marked property not as simple String, but as a valid JSON encoded in String"

This is what I'm interested in and what is implemented by the other examples I provided.

I have one specific key in my json that contains a huge json subtree, with thousands of objects and several layers of nesting. I don't want to these create thousands of objects per json parse because it leads to severe GC pressure on many Android devices.

tmm1 avatar Sep 13 '20 15:09 tmm1

Thanks for the clarification and your input!

It's not something we are going to do right now (at least until 1.1.0 version), but thanks to your feedback, I've left the possibility to add this functionality in a backwards-compatible way both for custom serializers and regular JSON usages. Let's see how it goes in Moshi and the demand on that.

Design idea: instead of using @RawJson annotation, introduce an inline class RawString(value: String) with its own custom serializer to provide a better type-safety and emphasis user intention in a type

qwwdfsad avatar Sep 14 '20 12:09 qwwdfsad

@qwwdfsad If this functionality were something I'd be interested in contributing, do you have any pointers on where to start?

ankushg avatar Sep 21 '21 15:09 ankushg

@qwwdfsad

I'd like to add that there's a slightly different use case that I have, which is preventing me from switching to kotlinx.serialization. I have a situation where I would like to store the sub-object, as a JSON string, in a database, but I also want to deserialise it to examine its contents. Without RawJson, this means deserialising the whole JSON object, and then reserialising the sub-object for storage. Similarly, when serving the data again (potentially as part of a collection), I'd need to deserialise the sub-object before serialising the full object for output. I might be a bit naive, not having delved into the specifics of how this all works, but to me this seems to be a bit redundant. The string is already there as a substring of the original input, or will become a substring of the output.

I've tried to achieve this through a custom serializer, a bit like this:

object RawJsonSerializer : JsonTransformingSerializer<String>(String.serializer()) {

    override fun transformDeserialize(element: JsonElement): JsonElement {
        if (element !is JsonObject) {
            throw Exception("Expected schedule object")
        }

        return JsonPrimitive(
            polymorphicSerialiser.encodeToString(polymorphicSerialiser.decodeFromJsonElement<BaseClass>(element))
        )
    }

    override fun transformSerialize(element: JsonElement): JsonElement {
        if (element !is JsonPrimitive || !element.isString) {
            throw Exception("Expected schedule string")
        }
        return JsonObject(polymorphicSerialiser.decodeFromString(element.content))
    }
}

but I found that for large collections of data, this ended up slower than using Jackson with the JsonRawValue annotation. Is there a better way to achieve this?

brendan-gero-humanetix avatar Dec 01 '21 04:12 brendan-gero-humanetix

I'm following this guide to implement fallback for deserializing Enums. However, I would like to also log the raw value when it failed. Is there any way to get this from the decoder?

chakflying avatar Feb 14 '22 08:02 chakflying

Unfortunately, we do not support retrieving raw values yet.

Also relevant: #1405

sandwwraith avatar Feb 21 '22 14:02 sandwwraith

Thanks for the clarification and your input!

It's not something we are going to do right now (at least until 1.1.0 version), but thanks to your feedback, I've left the possibility to add this functionality in a backwards-compatible way both for custom serializers and regular JSON usages. Let's see how it goes in Moshi and the demand on that.

Design idea: instead of using @RawJson annotation, introduce an inline class RawString(value: String) with its own custom serializer to provide a better type-safety and emphasis user intention in a type

Currently we have any way to achieve it? The documentation said I must provide a correct descriptor. But in this case I don't know which descriptor is suitable. We use JSON format in a bad way. Deserialize the whole tree is impossible in my case.(It use too many memory.) I must hand-write a deserializer which need access the json token and build structure in my own way. So I need kotlin serialization just "skip" the suitable tokens and leave it to my own code.

iseki0 avatar Feb 20 '23 19:02 iseki0

Is anything changed in this field, since JsonUnquotedLiteral was introduced a while ago?

I still can't find serializer which I could to use like this (barJson contains raw json object as a string):

@Serializable
data class Foo(
  @Serializable(JsonRawSerializer::class)
  val barJson: String
)

vdshb avatar Mar 17 '25 11:03 vdshb

BTW, while I'm waiting for some response... This is my naive implementation of JsonRawSerializer if someone is interested.

import kotlinx.serialization.ExperimentalSerializationApi
import kotlinx.serialization.KSerializer
import kotlinx.serialization.descriptors.PrimitiveKind
import kotlinx.serialization.descriptors.PrimitiveSerialDescriptor
import kotlinx.serialization.descriptors.SerialDescriptor
import kotlinx.serialization.encodeToString
import kotlinx.serialization.encoding.Decoder
import kotlinx.serialization.encoding.Encoder
import kotlinx.serialization.json.JsonArray
import kotlinx.serialization.json.JsonDecoder
import kotlinx.serialization.json.JsonEncoder
import kotlinx.serialization.json.JsonNull
import kotlinx.serialization.json.JsonObject
import kotlinx.serialization.json.JsonUnquotedLiteral
import kotlinx.serialization.json.jsonPrimitive

@OptIn(ExperimentalSerializationApi::class)
class JsonRawSerializer : KSerializer<String> {
    override val descriptor: SerialDescriptor = PrimitiveSerialDescriptor(JsonRawSerializer::class.qualifiedName!!, PrimitiveKind.STRING)

    override fun deserialize(decoder: Decoder): String {
        return if (decoder is JsonDecoder) {
            when (val element = decoder.decodeJsonElement()) {
                is JsonArray  -> decoder.json.encodeToString(element)
                is JsonObject -> decoder.json.encodeToString(element)
                JsonNull      -> "null"
                else          -> element.jsonPrimitive.content
            }
        } else {
            decoder.decodeString()
        }
    }

    override fun serialize(encoder: Encoder, value: String) {
        if (encoder is JsonEncoder) {
            if (value == "null") {
                encoder.encodeJsonElement(JsonNull)
            } else {
                encoder.encodeJsonElement(JsonUnquotedLiteral(value))
            }
        } else {
            encoder.encodeString(value)
        }
    }

}

vdshb avatar Mar 17 '25 13:03 vdshb