pycopy
pycopy copied to clipboard
marshall module ideas
It seems that MsgPack is a viable choice to implement marshall encdoing: https://github.com/msgpack/msgpack/blob/master/spec.md
Possibly, an adhoc serialization format would be even more efficient, but at least MsgPack is able to differentiate bytes vs str's, etc.
Problems would be: no differentiation between tuple and list, dict and OrderedDict.
Also, no encoding of array with 8 bits of length, there's a jump from 4 bits to 16 bits (same for maps).
There's also CBOR, and teh-drama between it and MsgPack: https://github.com/msgpack/msgpack/issues/129
CBOR is used in CoAP, so kinda would be "more useful" than MsgPack...
MsgPack has random gap in:
fixstr | 101xxxxx | 0xa0 - 0xbf |
---|---|---|
bin 8 | 11000100 | 0xc4 |
I.e., only short textual strs can be efficiently encoded, bytestr's require explicit len byte always.
CBOR doesn't have that "limitation": https://tools.ietf.org/html/rfc7049#appendix-B (of course, it encodes something else less efficiently instead, as all MsgPack encoding bytes are used (well, one is reserved)).
Note that motivation for marshall module is encoding data rows for btree database. I.e. the motivation is: "need to serialize tuples for btree db" -> "why not implement that by implementing marshall module which can be used for many other things too".
That adds additional requirement: being able to efficiently compare serialized arrays (i.e. without requiring full decoding).
CBOR defines encodings for bignums for example. Looks, like it's a winner.
CBOR tags are rather extensible, they are looking to incorporate fixed point types and arrays for ADCs. https://cbor-wg.github.io/array-tags/
MsgPack has random gap in
Umm, no? 0xc0 through 0xc3 are None/
A more relevant advantage of CBOR is that you can prefix an item with a rather simple "use the following data as input to ‹class›__setstate__()
" tag, where the class name is encoded in the tag. If you want to do the same thing with msgpack, you need either an in-memory copy of the object's encoded bytestring or two passes on the data structure __getstate()
returns. Shorter tags could be used for more-common distinctions between e.g. tuple
and list
: just specify a "read only hint" tag, and possibly a "the following data is ordered" tag for OrderedDict
.
Another advantage would be the ability to encode indeterminate-length data (this is basically impossible with msgpack), though I have no idea whether that is actually a relevant use case for micropython/pycopy.