kotlinx.serialization CBOR serializer encodes null class instance as empty map instead of null

Traced to this line: https://github.com/Kotlin/kotlinx.serialization/blob/d4d066d72a9f92f06c640be5a36a22f75d0d7659/formats/cbor/commonMain/src/kotlinx/serialization/cbor/internal/Encoder.kt#L106

I consider this a bug. Null objects should be encoded as the simple type null, no matter whether they are class types or other types.

In our use case, this is causing issues after deserialization, since the empty map is deserialized into a default instance of the type, rather than null.

Please consider removing this line and always encoding as null.

Nov 05 '24 19:11 Jacob-Amazon

Can you please provide an example of serializable class and expected/actual output?

Nov 06 '24 11:11 sandwwraith

@Serializable
data class Test(val nullableClass: Inner?, val nullableMap: Map<String, String>?)

@Serializable
data class Inner(val test: Int)

val test = Test(nullableClass = null, nullableMap = null)
Cbor.encodeToByteArray(Test.serializer(), test)
// Hex output: bf6d6e756c6c61626c65436c617373a06b6e756c6c61626c654d6170f6ff

The hex output bf6d6e756c6c61626c65436c617373a06b6e756c6c61626c654d6170f6ff looks like this (from https://cbor.me/):

{ "nullableClass": {}, "nullableMap": null}
BF                               # map(*)
   6D                            # text(13)
      6E756C6C61626C65436C617373 # "nullableClass"
   A0                            # map(0)
   6B                            # text(11)
      6E756C6C61626C654D6170     # "nullableMap"
   F6                            # primitive(22)
   FF                            # primitive(*)

Expected output would be this: bf6d6e756c6c61626c65436c617373f66b6e756c6c61626c654d6170f6ff

{ "nullableClass": null, "nullableMap": null}
BF                               # map(*)
   6D                            # text(13)
      6E756C6C61626C65436C617373 # "nullableClass"
   F6                            # primitive(22)
   6B                            # text(11)
      6E756C6C61626C654D6170     # "nullableMap"
   F6                            # primitive(22)
   FF                            # primitive(*)

Nov 06 '24 18:11 Jacob-Amazon

+1 on this

I am working on a project that involves ISO 18013-5 (mDL) One of the types in that spec is "SessionTranscript" - which is a CBOR array with an optional element at the end (may be null or some other type)

That type is used in cryptographic operations - so the values must match (each party creates the SessionTranscript independently)

For sake of clarity, a simplified version of the type could be:

    @OptIn(ExperimentalSerializationApi::class)
    @CborArray
    @Serializable
    data class SessionTranscript(
        val someValue: String,
        val nested: OtherKind?
    )

    @OptIn(ExperimentalSerializationApi::class)
    @CborArray
    @Serializable
    data class OtherKind(
        val otherValue: String,
    )

Describe the solution you'd like

Using cbor (with useDefiniteLengthEncoding = true) like so:

        val withNull = SessionTranscript(
            someValue = "foo",
            nested = null
        )

        cbor.encodeToByteArray(SessionTranscript.serializer(), withNull) // should encode to a value that ends in 0xF6 (NULL)

Should encode to:

    /**
     * 82           # array(2)
     *    63        # text(3)
     *       666F6F # "foo"
     *    F6        # null <- notice this null
     */

Maybe with some other CborBuilder option and/or annotation?

As mentioned: The current behavior (null becomes empty map for complex types) is expected, as per this PR: https://github.com/Kotlin/kotlinx.serialization/pull/2412

Rough changes - but that would be breaking to existing consumers expecting this as the default behavior https://github.com/ardune/kotlinx.serialization/commit/cfd69518bb7c345ebf4c871463e507eb80987572

Maybe a "CborBuilder" opt-in for this behavior change?

Feb 10 '25 20:02 ardune

I consider this a bug. Null objects should be encoded as the simple type null, no matter whether they are class types or other types.

It does indeed sound like a bug and fixing would be a breaking change. Good news is: You can work around it, if you do some manual work with the serializer. @ardune here's how we work with this in our mDL implementation. As it stands now, it seems to comply with the spec and interops with others (in de EUDIW context) perfectly fine. @nodh please care to chime in here? you know mDL better than me.

One more thing regarding mDL: ~~if you need it in the EUDIW context: The EU pretty much botched their spec by disregarding the ISO standard in some subtle details.~~ The ISO disregarded some strict CBOR/COSE encoding rules. (Whoever was responsible for this probably never wrote a parser in their life.) This means that you will need Obor and there is no way around it given how the default cbor decoder works (see here).

Feb 28 '25 10:02 JesusMcCloud

I appreciate the example @JesusMcCloud, as well as the heads up about EUDIW/Obor. In the current case: we worked around it in our solution, it just didn't meet my assumed behavior when starting (i.e. an explicit null was not encoded as such)

Do you think it makes sense to that CborBuilder option to opt-in to this breaking change? I can make it and do a PR if that seems like a viable/maintainable approach. My concern is that as a toggle may be worse in the long run.

Feb 28 '25 14:02 ardune

That is for @sandwwraith to decide. @Jacob-Amazon has me convinced that the original behaviour is incorrect and should be resolved. How to handle such a breaking change wrt. the existing ecosystem using this cbor implementation is a different cup of tea I don't feel qualified/experienced enough to comment on.

Feb 28 '25 23:02 JesusMcCloud

Now that I've seen this issue, I'm also convinced that a null object should be encoded as F6. This will also become an issue in our implementation as soon as we're implementing other handover structures.

Mar 03 '25 09:03 nodh

Just to clarify publicly: The EU consortium did not botch it. The ISO mDL spec itself loosened up some encoding rules compared to vanilla CBOR/COSE. So that consortium is to blame. What that means is: This issue it will not only affect the EUDIW, but every mDL implementation regardless of context.

Mar 03 '25 11:03 JesusMcCloud

I think we cannot change behavior as-is, indeed. You can send a PR that would incorporate the correct behavior under the flag.

Mar 13 '25 17:03 sandwwraith

While I do understand the compatibility argument, isn't CBOR still experimental? I'm not advocating for a radical breaking change, but rather a sensible strategy to make the default behaviour gradually the default:

Introducing a flag with a default value that behaves, as-is
Some time later, deprecating the flag and the function call without the flag and replacing the flag it with a new one, with flipped behaviour
Raising the deprecation level to ´ERROR`
Raising the deprecation level to. HIDDEN
Never actually removing it, so that old code remains working

Probably too naive and never going to work as easily, but keeping known-wrong behaviour for to maintain compatibility with a state that is officially experimental smells funky.

Alternatively: Deprecate this here CBOR format and replace it by something entirely new, heavily based on the much more flexible Obor.

I think a discussion (both internally in the serialization team and with users) is warranted, because in the end the current behaviour remains wrong.

Mar 13 '25 18:03 JesusMcCloud

I've made a PR for the first draft of this with the flag discussed.

Hopefully it is a reasonable jumping off point.

Mar 13 '25 21:03 ardune

I wanted to bump this to make sure it wasn't lost in the shuffle of things.

I'm willing to make sure that PR happens and meets expectations for maintainability as needed.

Jun 02 '25 19:06 ardune

Sorry, CBOR is kinda out of focus for now, so it takes us a lot of time to process things. I'll try to also take a look at other null problems in the complex and tell you if your PR fits the picture

Nov 10 '25 12:11 sandwwraith

I've checked history and this behavior was added by #2412. @JesusMcCloud Maybe you remember why it was needed?

Dec 05 '25 16:12 sandwwraith

I'm pretty sure I messed up due to mixing things up when dealing with classes annotated with (the badly named) @CborArray. Ayways, I double-checked and this one is definitely on me: Before my COSE PR, things worked correctly, so I think the fix by @ardune should be merged, especially since it provides a smooth migration path.

Sorry for messing up!

Dec 05 '25 17:12 JesusMcCloud

Given that it was added fairly recently and it is not up to standards, I think we can fix it without a migration path. Maybe in your next PR about CborElement :)

Dec 08 '25 10:12 sandwwraith

I think the fix for this deserves a separate, small PR, so it can be verified in isolation.

Dec 08 '25 17:12 JesusMcCloud

@JesusMcCloud, in the PR where null-object-as-a-map encoding was introduced, you explicitly mentioned that it was done for COSE-compliance. Are you sure that it is not needed indeed?

Dec 08 '25 19:12 fzhinkin

Thanks for following up! I'll prioritise this one over structured CBOR, so we get to first fix an actual problem that hits in production.

There is a touching point with COSE, but as I mentioned, I am confident that I intermingled two things in ways that should not go together. There is a legit test cases that fails, when this behaviour is reverted, but the ways the null-as-empty-object encoding is done right now is just wrong. I still have to dig into this some more. Luckily, we have a corpus of test data conforming to ISO mDL spec (and data that is not conforming to spec, but that we still need to be able to process) that we can use. Bad news is: none of it is self-contained, but it is split up between VC-K and Signum. Still, once this is narrowed down, I'll try to shrink things down into self-contained test cases. Also luckily: I don't have to sort this out all by myself, but I can bug @nodh, @gp-iaik, @n0900 and @lukpos with this until we all agree that this is fixed for good. However, this also means that even though it's a small fix, it will take a bit of time, probably past new year's.

Dec 09 '25 05:12 JesusMcCloud

After looking through different standards I have found the root cause of the current behaviour:

For example in RFC 8152 the unprotected_header parameter is defined as

Headers = (
protected : empty_or_serialized_map,
unprotected : header_map
)
header_map = {
Generic_Headers,
* label => values
}

i.e. unprotected is not nullable however all possible entries defined in Generic_Headers are nullable, with this being a CDDL group (i.e. its structure is flattened into the map)

Dec 09 '25 13:12 n0900

Taking up on @n0900's comment: COSE is utterly cursed. Data-model-wise protected and unprotected headers are just CoseHeader and when you model this in a sane way, you have two properties in your CoseSigned: protected and uprotected. Now they do get serialized differently, so it makes sense to have two properties and annotate them differently.

And this is where things get extra cursed and where I messed up: The spec never says that either is allowed to be null, but only ever an empty map. Semantically, this corresponds to null because you have cases where you only ever want to accept a CoseSigned that only contains a protected header, for example, so an empty unprotected header means no unprotected header properties, hence, you want to model this as nullable. Now you want this pattern, because the compiler helps you a lot with null checks, and not at all with data classes whose properties are all nullable, as this would force you to introduce error-prone boilerplate code that is then duplicated and adapted from class to class, just to check if something that is semantically null, is really semantically null, because the data model does allow you to specify it as null.

Now you also (currently) cannot easily use Map abstractions and decide to not care for abstractions, because you will have integer map keys all over. So my mistake was to assume the sane programming pattern to transform null into empty maps instead of introducing an annotation like @CborNullAsEmptyMap that is used on a per-property or per-class basis. Technically, this is is not needed, but not having that is a footgun. I am not 100% sure but I'd suspect the ISO mDL spec to push this pattern even further.

Dec 09 '25 13:12 JesusMcCloud

Addendum: as for added complexity due to such an annotation: The currently broken if-else in line 106 from the first commetn would need to be replaced with a correct check based on a @CborNullAsEmptyMap property or class-level annotation

Dec 09 '25 13:12 JesusMcCloud