Encoding from/to `CborElement`

Bytes can be decoded into an instance of CborElement with the [Cbor.decodeFromByteArray] function by either manually specifying [CborElement.serializer()] or specifying [CborElement] as generic type parameter.
It is also possible to encode arbitrary serializable structures to a CborElement through [Cbor.encodeToCborElement].

Since these operations use the same code paths as regular serialization (but with specialized serializers), the config flags behave as expected

Newly introduced CBOR-specific structures

[CborPrimitive] represents primitive CBOR elements, such as string, integer, float boolean, and null. CBOR byte strings are also treated as primitives
Each primitive has a [value][CborPrimitive.value]. Depending on the concrete type of the primitive, it maps to corresponding Kotlin Types such as String, Int, Double, etc. Note that Cbor discriminates between positive ("unsigned") and negative ("signed") integers!
CborPrimitive is itself an umbrella type (a sealed class) for the following concrete primitives:
- [CborNull] mapping to a Kotlin null
- [CborBoolean] mapping to a Kotlin Boolean
- [CborInt] which is an umbrella type (a sealed class) itself for the following concrete types (it is still possible to instantiate it as the invoke operator on its companion is overridden accordingly):
  - [CborPositiveInt] represents all Long numbers ≥0
  - [CborNegativeInt] represents all Long numbers <0
- [CborString] maps to a Kotlin String
- [CborFloat] maps to Kotlin Double
- [CborByteString] maps to a Kotlin ByteArray and is used to encode them as CBOR byte string (in contrast to a list of individual bytes)
[CborList] represents a CBOR array. It is a Kotlin [List] of CborElement items.
[CborMap] represents a CBOR map/object. It is a Kotlin [Map] from CborElement keys to CborElement values. This is typically the result of serializing an arbitrary

Example

bf                                 # map(*)
   61                              #   text(1)
      61                           #     "a"
   cc                              #   tag(12)
      1a 0fffffff                  #     unsigned(268,435,455)
   d8 22                           #   base64 encoded text, tag(34)
      61                           #     text(1)
         62                        #       "b"
                                   #     invalid length at 0 for base64
   20                              #   negative(-1)
   d8 38                           #   tag(56)
      61                           #     text(1)
         63                        #       "c"
   d8 4e                           #   typed array of i32, little endian, twos-complement, tag(78)
      42                           #     bytes(2)
         cafe                      #       "\xca\xfe"
                                   #     invalid data length for typed array
   61                              #   text(1)
      64                           #     "d"
   d8 5a                           #   tag(90)
      cc                           #     tag(12)
         6b                        #       text(11)
            48656c6c6f20576f726c64 #         "Hello World"
   ff                              #   break

Decoding it results in the following CborElement (shown in manually formatted diagnostic notation):

CborMap(tags=[], content={  
    CborString(tags=[],   value=a) = CborPositiveInt( tags=[12],     value=268435455),  
    CborString(tags=[34], value=b) = CborNegativeInt( tags=[],       value=-1),  
    CborString(tags=[56], value=c) = CborByteString(  tags=[78],     value=h'cafe),  
    CborString(tags=[],   value=d) = CborString(      tags=[90, 12], value=Hello World)  
})

Implementation Details

I tried to stick to the existing CBOR codepaths as closely as possible, and the approach to add tags directly to CborElements is the most pragmatic way of getting expressiveness and convenient use. It does come with a caveat (also taken from the Readme:

Tags are properties of CborElements, and it is possible to mixing arbitrary serializable values with CborElements that contain tags inside a serializable structure. It is also possible to annotate any [CborElement] property of a generic serializable class with @ValueTags.
This can lead to asymmetric behavior when serializing and deserializing such structures!

The test cases (and comments in the test cases reflect this

Closing Remarks

I also fixed a faulty hex input test vector that I introduced myself, last year, if I pieced it together correctly (see here) and I amended the benchmarks. (see here).

Since the commits from here will be squashed anyways, I did not care for a clean history.

Jul 05 '25 05:07 JesusMcCloud

Full disclosure: This PR incorporates code from a draft generated by Junie (albeit an impressive draft that saved a day of work). This is not a dumb copypasta of AI-generated code. Even if it were already feature-complete It would still not yet be marked ready for review because we have yet to review everything internally. I also want to stress that "we" is not a euphemism. There will be at least two of us reviewing and discussing internally, almost certainly with additional input from other humans in the process of readying this PR.

Jul 08 '25 10:07 JesusMcCloud

Performance seems to be OK (fromBytes and toBytes are the baseline on my machine):

Metric / Benchmark	`fromBytes`	`fromStruct`	`structFromBytes`	`toBytes`	`structToBytes`	`toStruct`
Average (ops/ms)	1205.615 ± 20.541	1545.814 ± 50.743	2896.728 ± 74.485	2089.013 ± 30.152	1442.766 ± 32.257	2581.397 ± 32.497
Min	1186.023	1458.225	2796.131	2066.499	1404.482	2550.026
Max	1229.778	1581.420	2960.572	2125.658	1475.015	2619.815
Stdev	13.586	33.563	49.267	19.944	21.336	21.495
CI low (99.9 %)	1185.075	1495.071	2822.244	2058.861	1410.509	2548.900
CI high (99.9 %)	1226.156	1596.557	2971.213	2119.165	1475.023	2613.893

My hot takes:

Deserialising from a structure is fast enough since it is in the same ballpark as deserialising from bytes
Deserialising into a generic CBOR structure takes twice the time than directly deserialising, which is fine, given that we instantiate much more as even primitives need a containing class and an array of tags
Serialising a generic CBOR structure to bytes is faster but in the same ballpark as generic to-byte serialisation of arbitrary serializable data
Serializing to a CBOR structure is slower than to bytes, but OK enough, since it's in the same ballpark and we instantiate more

Aug 07 '25 11:08 JesusMcCloud

I just noticed something that looks weird to me. See this test case here that is failing and closely compare expected vs actual.

the byte string is wrapped twice for the reference. ~~I know there were some discussions, but I don't recall them, so I have to ask: why? did I mess this up last year or is this intentional? Because the way I see it, were' wrapping a bytearray instead of encoding it differently~~
EDIT: the test vector is faulty as this comparison fails the same way

Aug 07 '25 12:08 JesusMcCloud

Any updates on the open discussion points?

Oct 07 '25 11:10 JesusMcCloud

Thanks for alle the comments! I'll have to dig up some memories that have since collected dust to sort some of the issue out and figure some stuff out again from scratch, as I haven't looked into this for many weeks and forgotten about most of the implementation details ;-). So it will take a bit before I'll push changes, addressing issues.

Dec 05 '25 09:12 JesusMcCloud

Structured cbor

Encoding from/to CborElement

Newly introduced CBOR-specific structures

Example

Implementation Details

Closing Remarks

Encoding from/to `CborElement`