CBOR serializer encodes null class instance as empty map instead of null
Traced to this line: https://github.com/Kotlin/kotlinx.serialization/blob/d4d066d72a9f92f06c640be5a36a22f75d0d7659/formats/cbor/commonMain/src/kotlinx/serialization/cbor/internal/Encoder.kt#L106
I consider this a bug. Null objects should be encoded as the simple type null, no matter whether they are class types or other types.
In our use case, this is causing issues after deserialization, since the empty map is deserialized into a default instance of the type, rather than null.
Please consider removing this line and always encoding as null.
Can you please provide an example of serializable class and expected/actual output?
@Serializable
data class Test(val nullableClass: Inner?, val nullableMap: Map<String, String>?)
@Serializable
data class Inner(val test: Int)
val test = Test(nullableClass = null, nullableMap = null)
Cbor.encodeToByteArray(Test.serializer(), test)
// Hex output: bf6d6e756c6c61626c65436c617373a06b6e756c6c61626c654d6170f6ff
The hex output bf6d6e756c6c61626c65436c617373a06b6e756c6c61626c654d6170f6ff looks like this (from https://cbor.me/):
{ "nullableClass": {}, "nullableMap": null}
BF # map(*)
6D # text(13)
6E756C6C61626C65436C617373 # "nullableClass"
A0 # map(0)
6B # text(11)
6E756C6C61626C654D6170 # "nullableMap"
F6 # primitive(22)
FF # primitive(*)
Expected output would be this: bf6d6e756c6c61626c65436c617373f66b6e756c6c61626c654d6170f6ff
{ "nullableClass": null, "nullableMap": null}
BF # map(*)
6D # text(13)
6E756C6C61626C65436C617373 # "nullableClass"
F6 # primitive(22)
6B # text(11)
6E756C6C61626C654D6170 # "nullableMap"
F6 # primitive(22)
FF # primitive(*)
+1 on this
I am working on a project that involves ISO 18013-5 (mDL) One of the types in that spec is "SessionTranscript" - which is a CBOR array with an optional element at the end (may be null or some other type)
That type is used in cryptographic operations - so the values must match (each party creates the SessionTranscript independently)
For sake of clarity, a simplified version of the type could be:
@OptIn(ExperimentalSerializationApi::class)
@CborArray
@Serializable
data class SessionTranscript(
val someValue: String,
val nested: OtherKind?
)
@OptIn(ExperimentalSerializationApi::class)
@CborArray
@Serializable
data class OtherKind(
val otherValue: String,
)
Describe the solution you'd like
Using cbor (with useDefiniteLengthEncoding = true) like so:
val withNull = SessionTranscript(
someValue = "foo",
nested = null
)
cbor.encodeToByteArray(SessionTranscript.serializer(), withNull) // should encode to a value that ends in 0xF6 (NULL)
Should encode to:
/**
* 82 # array(2)
* 63 # text(3)
* 666F6F # "foo"
* F6 # null <- notice this null
*/
Maybe with some other CborBuilder option and/or annotation?
As mentioned: The current behavior (null becomes empty map for complex types) is expected, as per this PR: https://github.com/Kotlin/kotlinx.serialization/pull/2412
Rough changes - but that would be breaking to existing consumers expecting this as the default behavior https://github.com/ardune/kotlinx.serialization/commit/cfd69518bb7c345ebf4c871463e507eb80987572
Maybe a "CborBuilder" opt-in for this behavior change?
I consider this a bug. Null objects should be encoded as the simple type
null, no matter whether they are class types or other types.
It does indeed sound like a bug and fixing would be a breaking change. Good news is: You can work around it, if you do some manual work with the serializer. @ardune here's how we work with this in our mDL implementation. As it stands now, it seems to comply with the spec and interops with others (in de EUDIW context) perfectly fine. @nodh please care to chime in here? you know mDL better than me.
One more thing regarding mDL: ~~if you need it in the EUDIW context: The EU pretty much botched their spec by disregarding the ISO standard in some subtle details.~~ The ISO disregarded some strict CBOR/COSE encoding rules. (Whoever was responsible for this probably never wrote a parser in their life.) This means that you will need Obor and there is no way around it given how the default cbor decoder works (see here).
I appreciate the example @JesusMcCloud, as well as the heads up about EUDIW/Obor. In the current case: we worked around it in our solution, it just didn't meet my assumed behavior when starting (i.e. an explicit null was not encoded as such)
Do you think it makes sense to that CborBuilder option to opt-in to this breaking change? I can make it and do a PR if that seems like a viable/maintainable approach. My concern is that as a toggle may be worse in the long run.
That is for @sandwwraith to decide. @Jacob-Amazon has me convinced that the original behaviour is incorrect and should be resolved. How to handle such a breaking change wrt. the existing ecosystem using this cbor implementation is a different cup of tea I don't feel qualified/experienced enough to comment on.
Now that I've seen this issue, I'm also convinced that a null object should be encoded as F6. This will also become an issue in our implementation as soon as we're implementing other handover structures.
Just to clarify publicly: The EU consortium did not botch it. The ISO mDL spec itself loosened up some encoding rules compared to vanilla CBOR/COSE. So that consortium is to blame. What that means is: This issue it will not only affect the EUDIW, but every mDL implementation regardless of context.
I think we cannot change behavior as-is, indeed. You can send a PR that would incorporate the correct behavior under the flag.
While I do understand the compatibility argument, isn't CBOR still experimental? I'm not advocating for a radical breaking change, but rather a sensible strategy to make the default behaviour gradually the default:
- Introducing a flag with a default value that behaves, as-is
- Some time later, deprecating the flag and the function call without the flag and replacing the flag it with a new one, with flipped behaviour
- Raising the deprecation level to ´ERROR`
- Raising the deprecation level to.
HIDDEN - Never actually removing it, so that old code remains working
Probably too naive and never going to work as easily, but keeping known-wrong behaviour for to maintain compatibility with a state that is officially experimental smells funky.
Alternatively: Deprecate this here CBOR format and replace it by something entirely new, heavily based on the much more flexible Obor.
I think a discussion (both internally in the serialization team and with users) is warranted, because in the end the current behaviour remains wrong.
I've made a PR for the first draft of this with the flag discussed.
Hopefully it is a reasonable jumping off point.
I wanted to bump this to make sure it wasn't lost in the shuffle of things.
I'm willing to make sure that PR happens and meets expectations for maintainability as needed.
Sorry, CBOR is kinda out of focus for now, so it takes us a lot of time to process things. I'll try to also take a look at other null problems in the complex and tell you if your PR fits the picture
I've checked history and this behavior was added by #2412. @JesusMcCloud Maybe you remember why it was needed?
I'm pretty sure I messed up due to mixing things up when dealing with classes annotated with (the badly named) @CborArray. Ayways, I double-checked and this one is definitely on me: Before my COSE PR, things worked correctly, so I think the fix by @ardune should be merged, especially since it provides a smooth migration path.
Sorry for messing up!
Given that it was added fairly recently and it is not up to standards, I think we can fix it without a migration path. Maybe in your next PR about CborElement :)
I think the fix for this deserves a separate, small PR, so it can be verified in isolation.
@JesusMcCloud, in the PR where null-object-as-a-map encoding was introduced, you explicitly mentioned that it was done for COSE-compliance. Are you sure that it is not needed indeed?
Thanks for following up! I'll prioritise this one over structured CBOR, so we get to first fix an actual problem that hits in production.
There is a touching point with COSE, but as I mentioned, I am confident that I intermingled two things in ways that should not go together. There is a legit test cases that fails, when this behaviour is reverted, but the ways the null-as-empty-object encoding is done right now is just wrong. I still have to dig into this some more. Luckily, we have a corpus of test data conforming to ISO mDL spec (and data that is not conforming to spec, but that we still need to be able to process) that we can use. Bad news is: none of it is self-contained, but it is split up between VC-K and Signum. Still, once this is narrowed down, I'll try to shrink things down into self-contained test cases. Also luckily: I don't have to sort this out all by myself, but I can bug @nodh, @gp-iaik, @n0900 and @lukpos with this until we all agree that this is fixed for good. However, this also means that even though it's a small fix, it will take a bit of time, probably past new year's.
After looking through different standards I have found the root cause of the current behaviour:
For example in RFC 8152 the unprotected_header parameter is defined as
Headers = (
protected : empty_or_serialized_map,
unprotected : header_map
)
header_map = {
Generic_Headers,
* label => values
}
i.e. unprotected is not nullable however all possible entries defined in Generic_Headers are nullable, with this being a CDDL group (i.e. its structure is flattened into the map)
Taking up on @n0900's comment: COSE is utterly cursed. Data-model-wise protected and unprotected headers are just CoseHeader and when you model this in a sane way, you have two properties in your CoseSigned: protected and uprotected. Now they do get serialized differently, so it makes sense to have two properties and annotate them differently.
And this is where things get extra cursed and where I messed up: The spec never says that either is allowed to be null, but only ever an empty map. Semantically, this corresponds to null because you have cases where you only ever want to accept a CoseSigned that only contains a protected header, for example, so an empty unprotected header means no unprotected header properties, hence, you want to model this as nullable. Now you want this pattern, because the compiler helps you a lot with null checks, and not at all with data classes whose properties are all nullable, as this would force you to introduce error-prone boilerplate code that is then duplicated and adapted from class to class, just to check if something that is semantically null, is really semantically null, because the data model does allow you to specify it as null.
Now you also (currently) cannot easily use Map abstractions and decide to not care for abstractions, because you will have integer map keys all over. So my mistake was to assume the sane programming pattern to transform null into empty maps instead of introducing an annotation like @CborNullAsEmptyMap that is used on a per-property or per-class basis. Technically, this is is not needed, but not having that is a footgun. I am not 100% sure but I'd suspect the ISO mDL spec to push this pattern even further.
Addendum: as for added complexity due to such an annotation: The currently broken if-else in line 106 from the first commetn would need to be replaced with a correct check based on a @CborNullAsEmptyMap property or class-level annotation