kotlinx.serialization
kotlinx.serialization copied to clipboard
Strict mode for CBOR
What is your use-case and why do you need this feature?
My app needs to be able to sign and verify serialized data, which requires that the parse be unambiguous. The main problem is duplicate map keys. If there are repeated keys, then two recipients may disagree on what the signed data says, which is obviously a problem. :-) I've written on this kind of parser mismatch vulnerability previously: https://www.brainonfire.net/blog/2022/04/29/preventing-parser-mismatch/
The CBOR implementation currently accepts duplicate keys by using the last one encountered, but the spec says in §2.1:
A map that has duplicate keys may be well-formed, but it is not valid
§3.10 "Strict Mode" then waffles a bit and puts the onus on the sender to "avoid ambiguously decodable data" and defines an optional strict mode that decoders are not required to implement, particularly calling out duplicate key detection as a possibly heavy-weight requirement.
However, kotlinx.serialization already has the ability to detect duplicate keys because it builds up a HashMap or similar which can simply be consulted before insertion. (There's no streaming interface that I'm aware of.) The language in §3.10 is likely aimed at embedded and other highly constrained environments, which isn't particularly relevant to Kotlin.
Describe the solution you'd like
- Generally support rejection of repeated map keys, as an optional feature.
- Expose this as part of a new strict-mode feature in the
Cbor
implementation, enabled by default to comply with §2.1.
(I'd also love for the duplicate key rejection to be enabled by default for all formats, as this is a known vulnerability in many, many protocols and people should be protected by default—but still able to explicitly opt out of this protection if they know what they're doing. Major version bump, I know.)
Note: https://github.com/Kotlin/kotlinx.serialization/issues/1990 covers rejection of duplicate keys, but it wasn't clear if that was meant to be specific to JSON.
(Updated with more info from the spec.)
Yes, it sounds very reasonable. I do not mind adding it at some point in the future or via contribution.
Great, I may try to work up a PR. Where would you recommend getting started? I found insertKeyValuePair
in CollectionSerializers—but it didn't seem that the tests actually exercise that code path, and I can't find anything that calls insertKeyValuePair
.
@timmc What happens from a format perspective is that it recognizes that this is a collection serializer. Then it will call the update decoder (at some point it was part of DeserializerStrategy, but it is no longer) with the previous list value.That update will create/update the collection with the new value (or not). It would be possible to create a new version of this function that takes a strict argument (with the default implementation of the previous one passing false for strictness).
Would you be able to point me to a code location? I'm not finding useful references to update
, CollectionSerializer
, etc. I just discovered that LinkedHashMapSerializer
extends LinkedHashMap
, but I'm having trouble finding the code that actually puts key-value pairs into a map when deserializing map data.
OK, found it -- builder[key] = value
in MapLikeSerializer
. Starting a branch here: https://github.com/Kotlin/kotlinx.serialization/compare/dev...timmc:timmc/cbor-strict
I've found it straightforward to reject duplicate keys. Making it configurable would be a larger undertaking—I can either plumb it through via the serializers or via the descriptors, but there's not really a clean way to do it since there's no config object being passed along. Instead, I would have to add allowDuplicateKeys: Boolean
to a large number of method and constructor signatures, and I still haven't found the end of that chain.
Would you accept a PR that just forbids duplicate keys in maps and objects, with an option to later add a configuration option that permits duplicate keys?
@timmc I don't think it is a good idea to change existing behavior so that people would start receiving exceptions on a code that worked before, especially without an option to bring old behavior back. So it's better to have it. Flags are already stored in Cbor
object, which is available in every CborReader
. So you need to pass this object further down to CborDecoder
, and that's likely it.
@timmc The code involved is:
https://github.com/Kotlin/kotlinx.serialization/blob/51cb8e8e556983fc83a565d5f04bb089363453e0/core/commonMain/src/kotlinx/serialization/internal/CollectionSerializers.kt#L99-L113
and
https://github.com/Kotlin/kotlinx.serialization/blob/51cb8e8e556983fc83a565d5f04bb089363453e0/core/commonMain/src/kotlinx/serialization/internal/CollectionSerializers.kt#L26-L51
It is important to note that for some formats it is valid to read list/map items one by one (for example in Protobuf): https://github.com/Kotlin/kotlinx.serialization/blob/51cb8e8e556983fc83a565d5f04bb089363453e0/formats/protobuf/commonMain/src/kotlinx/serialization/protobuf/internal/ProtobufDecoding.kt#L210-L218
This later case requires special handling of the MapSerializer. So you have two separate cases to support this generically (reading multiple items at a time, directly in MapSerializer; and repeatedly reading sublists).
I'm not sure it is a good idea to add changes to CollectionSerializer
since it is used by all formats. You also need to check whether regular objects have duplicate keys. It seems you have to track keys manually, with some kind of Set
.
@sandwwraith Indeed modifying CollectionSerializer would be a major undertaking with various compatibility concerns/challenges.
I don't think CborMapReader
has enough information to build and check a list of keys. By the time decodeElementIndex
has been called (in the parent class CborListReader
), MapLikeSerializer.readElement
has already called decodeSerializableElement
to read the key.
Maybe it would make sense for MapLikeSerializer
to call a new CompositeDecoder.visitedKey
method that accepts a key and then has the option of throwing an exception?
Maybe it would make sense for
MapLikeSerializer
to call a newCompositeDecoder.visitedKey
method that accepts a key and then has the option of throwing an exception?
I like this solution (even though some bits of the name/semantics could be "improved"). The function can have a default implementation that results in the current behaviour. Note that you should probably also think about the set serializer (sets also don't allow duplicate values/only keep the last one).
I had a look at the library source code. You probably want to do something that checks the result of put
/add
which normally return the previous value. Doing this will maintain the efficiency of the implementation (contains checks are not cheap)
Yeah, it seems to work out OK -- I have https://github.com/Kotlin/kotlinx.serialization/pull/2681 which still needs some work but should be on the right path.