typeid icon indicating copy to clipboard operation
typeid copied to clipboard

Future specification on binary format

Open MMZK1526 opened this issue 1 year ago • 8 comments

Is there any plan to add into the specification how to convert a typeid to binary format?

In my other personal project utilising typeid, I will need to serialise the ids. So far I'm implementing my own serialisation only for that specific project, but if there will be a formal specification, I can include that in the Haskell implementation as well.

MMZK1526 avatar Jul 14 '23 17:07 MMZK1526

A similar question came up here: https://github.com/jetpack-io/typeid-go/issues/5

So definitely open to having the spec define a formal binary representation. Did you already have a particular binary representation in mind?

loreto avatar Jul 14 '23 17:07 loreto

So definitely open to having the spec define a formal binary representation. Did you already have a particular binary representation in mind?

I'm still experimenting with it. Currently, I have an 8-bit length indicator of the prefix followed by the raw ASCII of the prefix, then followed by the normal encoding of the UUID.

It's not the most compact way of doing so, e.g. the length only needs 6 bits and each letter only 5 bits. I think I'm happy with what I'm doing now for my particular use case (since I don't need to squeeze every inch of space), but it may not be very suitable as a standard way defined in a spec.

Another possibility is (if we use 5 bits to encode each letter) to stop encoding the length but fuse a separator indicator with the last letter, since normally there are 32 - 26 = 6 unused bits.

MMZK1526 avatar Jul 14 '23 17:07 MMZK1526

For the spec I think we need to answer what we're trying to optimize for. Things running through my mind include:

  • How important is size? Do we want the absolute minimal encoding, or are we willing to trade it off for something else? (say speed)
  • How important is performance? Should we try to keep the encoding 8-bit aligned?
  • Sortable. TypeIDs promise to be k-sortable, and in some applications like DBs their sorting order is important. Do we want the binary representation to guarantee the same sorting order as the string representation? (if not, some implementations might end up sorting differently depending on which of the representations they use for sorting)

Do you have any thoughts on these?

loreto avatar Jul 14 '23 17:07 loreto

Tagging people who have implemented typeid libraries in other languages: @cbuctok @sloanelybutsurely @fxlae @softprops @faustbrian @akhundMurad @broothie @conradludgate @johnnynotsolucky @Frizlab @ongteckwu @tensorush

Do you have a need for a binary encoding specification? If so, what properties do you think are important for your use cases?

loreto avatar Jul 14 '23 17:07 loreto

For a binary encoding, I would expect to have an already typed binary schema. In that case, I'd personally use a UUID big endian 16 byte encoding rather than create anything bespoke. Since my binary schema would already be typed, I would forfeit the type prefix.

For a nontyped binary format like cbor, I could imagine a custom encoding though. Cbor has no byte alignment properties so I would perhaps encode the prefix str and the 16 bytes as a cbor array

conradludgate avatar Jul 14 '23 17:07 conradludgate

I wonder if we're better off not defining a binary encoding as part of the spec and leaving it up to the use case. The examples @conradludgate gives make me think the ideal encoding is use-case dependent. If you can already guarantee the type in your binary format, you can completely elide the prefix, and re-introduce it when decoding the binary representation. If you want to encode the type, you might be better off using the representation suggested by the format you're using (i.e. cbor might represent a string + vector one way, protobufs and jsonb might do it a different way)

loreto avatar Jul 14 '23 17:07 loreto

I created the lib just "because I could" and am not using it, so I'd be happy with whatever binary encoding specs you guys will come up with 🙂

Frizlab avatar Jul 14 '23 23:07 Frizlab

IMO, it would be better to define several possible encoding options for a variety of use cases.

akhundMurad avatar Jul 15 '23 08:07 akhundMurad