kotlinx.serialization icon indicating copy to clipboard operation
kotlinx.serialization copied to clipboard

RFC: Allow disabling coercion between JSON primitives

Open pschichtel opened this issue 4 months ago • 9 comments

Currently, and independent of the isLenient option, the JSON decoders will coerce Strings to other primitiv types, allowing e.g. "true" to be decoded as Boolean. We are currently transitioning from a very strict decoder setup based on Jackson over to kotlinx serialization for some of the nicer kotlin-specific functionality and the lack of reflection.

This PR is currently meant as an RFC to gauge if this is something that would be accepted upstream. As such there are some caveats:

  • There are no additional tests, however the existing tests show that the default behavior is unchanged.
  • No documentation has been written for the new functionality.
  • There are some TODO's which will require closer inspection for a final implementation and some dedicated tests.
  • The new option should probably be marked as @ExperimentalSerializationApi, but it is not currently.

I would obviously add those things, if there is a chance this would be pulled.

The implementation is rather simple: Introduce a boolean option and in and around the lexer prevent the use of quoted strings, where they wouldn't be necessary unless coercion is wanted. In some places the necessary functionality was already in place as separate methods (decodeBoolean vs decodeBooleanLenient), in others I was able to add minimal changes, usually as part of existing checks around string quotes. I tried to keep the interfaces stable.

Related issues:

  • https://github.com/Kotlin/kotlinx.serialization/issues/1042
  • https://github.com/Kotlin/kotlinx.serialization/issues/2696

pschichtel avatar Aug 15 '25 17:08 pschichtel

Looks like a good idea to me

pdvrieze avatar Aug 23 '25 16:08 pdvrieze

I just rebased this onto latest dev. @sandwwraith @Tapchicoma Is there any chance to get feedback on the "does this have a chance of getting pulled?" question? I've been using this in production in a cloud service, so it's seen quite a bit of traffic without any issues so far.

pschichtel avatar Sep 19 '25 08:09 pschichtel

I've rebased this again. @sandwwraith @fzhinkin Have I completely missed some process here? Any feedback would be appreciated.

pschichtel avatar Oct 15 '25 12:10 pschichtel

Hi! Sorry, it took me a long time to check out your proposal.

In general, I like the idea of adding such a setting. It would indeed help with stricter requirements and migrating between frameworks, such as Jackson.

However, there are several key points that have to be addressed:

  • Conversion from JsonElement. You can take a look at AbstractJsonTreeDecoder to see how it's done. I think it would be straightforward to add it there, since we have JsonPrimitive.isString property.
  • What to do with things like Map<Int, Int> or Map<String, Boolean>. (see also #2749 for reference). In Json, keys can only be strings. So now both {"42": "42"} and {"42": 42} could be mapped to the Map<Int, Int>. I'm not sure whether this flag should prohibit the former. And it probably shouldn't be blocking key parsing at all. What do you think?
  • Since changes involve a Json parser, I would say this is very desirable not only to write tests, but also benchmarks for the new feature. You can take TwitterFeedBenchmark as an example.

What do you think of this? If you want to continue working, I'd happily review your contribution.

sandwwraith avatar Oct 27 '25 14:10 sandwwraith

The choice of how to handle/serialize maps with non-string keys is ultimately a choice for the format to make (as there is no "native" mapping). The current behaviour to use quoted strings is probably the most elegant (although a strict parameter could prohibit it altogether).

Fundamentally I think that the behaviour of map serialization with non-string keys is a separate one (where configuration may specify an alternative approach as for complex keys). The case for non-string primitive keys is that (to support them at all) the format maps them to strings and serializes/deserializes them as strings.

For map values, they should follow the coercion rule (Map<Int, Int> would only allow {"42": 42 }

pdvrieze avatar Oct 27 '25 15:10 pdvrieze

@sandwwraith thanks for your feedback. Yes I'm still interested in working on this.

Conversion from JsonElement. You can take a look at AbstractJsonTreeDecoder to see how it's done. I think it would be straightforward to add it there, since we have JsonPrimitive.isString property.

What issue are you referring to? Decoding from JsonElement already works with my changes, doesn't it?

to do with things like Map<Int, Int> or Map<String, Boolean>. (see also Quoted booleans are being allowed even with isLenient = false #2749 for reference). In Json, keys can only be strings. So now both {"42": "42"} and {"42": 42} could be mapped to the Map<Int, Int>. I'm not sure whether this flag should prohibit the former. And it probably shouldn't be blocking key parsing at all. What do you think?

May intention wasn't to touch anything regarding that behavior in this PR. as @pdvrieze suggests I think this is a separate topic for a separate PR. My intention here was merely to make primitive decoding, where it happens, more strict. I think it's best to go in smaller steps here.

changes involve a Json parser, I would say this is very desirable not only to write tests, but also benchmarks for the new feature. You can take TwitterFeedBenchmark as an example.

True! I'll have a look at that.

I have a few lose ends to tie up before I can work on this again, but I definitely will. We currently use this via a fork in production, so there is a strong interest in getting this upstream :)

pschichtel avatar Oct 27 '25 18:10 pschichtel

What issue are you referring to? Decoding from JsonElement already works with my changes, doesn't it?

You're right, I overlooked your changes there.

May intention wasn't to touch anything regarding that behavior in this PR.

Well, since code path for Map is the same, you're touching it in any case. In particular, this code throws a decoding exception now:

@Test
fun testStrictParsing() {
    val j = Json {
        allowPrimitiveCoercion = false
    }
    val input = """{"42": 42}"""
    println(j.decodeFromString<Map<Int, Int>>( input))
}

I do not mind this while the flag is experimental, but we would need to make a decision whether this behavior is desirable or if we should allow such deserialization. But I agree, it is possible to do it later.

I have a few loose ends to tie up before I can work on this again, but I definitely will.

Glad to hear that!

sandwwraith avatar Oct 28 '25 11:10 sandwwraith

Well, since code path for Map is the same, you're touching it in any case. In particular, this code throws a decoding exception now

We ran into one such case in our codebase (actually it was a value class that wraps an int) with this branch and solved it by implementing a dedicated "map key serializer":

@Serializable
@JvmInline
value class IntWrapper(val value: Int)

private object IntWrapperKeySerializer : KSerializer<IntWrapper> {
    override val descriptor: SerialDescriptor = PrimitiveSerialDescriptor("IntWrapperMapKey", PrimitiveKind.STRING)

    override fun serialize(encoder: Encoder, value: IntWrapper) = encoder.encodeString(value.value.toString())
    override fun deserialize(decoder: Decoder) = IntWrapper(decoder.decodeString().toInt())
}

typealias IntWrapperStringMap = Map<
    @Serializable(with = IntWrapperKeySerializer::class)
    IntWrapper,
    String,
    >

@Serializable
data class Foo(
    val map: IntWrapperStringMap
)

Coming from Jackson, this didn't seem too surprising, since you also need dedicated map key (de)serializers there.

Gama11 avatar Oct 28 '25 11:10 Gama11

@sandwwraith Ahh now I understand what you meant. Honestly I think it's the only logical consequence of enabling this flag, that keys must be decoded as strings, if necessary by using custom deserializer. With the great support for value classes this almost feels like a non-issue honestly. We should probably flesh out the documentation on the option to explicitly state this property, but I would argue that people looking for such a strict configuration are expecting this anyway.

pschichtel avatar Oct 28 '25 13:10 pschichtel