pycopy icon indicating copy to clipboard operation
pycopy copied to clipboard

marshall module ideas

Open pfalcon opened this issue 6 years ago • 9 comments

It seems that MsgPack is a viable choice to implement marshall encdoing: https://github.com/msgpack/msgpack/blob/master/spec.md

Possibly, an adhoc serialization format would be even more efficient, but at least MsgPack is able to differentiate bytes vs str's, etc.

pfalcon avatar Jan 07 '18 10:01 pfalcon

Problems would be: no differentiation between tuple and list, dict and OrderedDict.

pfalcon avatar Jan 07 '18 10:01 pfalcon

Also, no encoding of array with 8 bits of length, there's a jump from 4 bits to 16 bits (same for maps).

pfalcon avatar Jan 07 '18 10:01 pfalcon

There's also CBOR, and teh-drama between it and MsgPack: https://github.com/msgpack/msgpack/issues/129

pfalcon avatar Jan 07 '18 11:01 pfalcon

CBOR is used in CoAP, so kinda would be "more useful" than MsgPack...

pfalcon avatar Jan 07 '18 11:01 pfalcon

MsgPack has random gap in:

fixstr 101xxxxx 0xa0 - 0xbf
bin 8 11000100 0xc4

I.e., only short textual strs can be efficiently encoded, bytestr's require explicit len byte always.

CBOR doesn't have that "limitation": https://tools.ietf.org/html/rfc7049#appendix-B (of course, it encodes something else less efficiently instead, as all MsgPack encoding bytes are used (well, one is reserved)).

pfalcon avatar Jan 07 '18 11:01 pfalcon

Note that motivation for marshall module is encoding data rows for btree database. I.e. the motivation is: "need to serialize tuples for btree db" -> "why not implement that by implementing marshall module which can be used for many other things too".

That adds additional requirement: being able to efficiently compare serialized arrays (i.e. without requiring full decoding).

pfalcon avatar Jan 07 '18 12:01 pfalcon

CBOR defines encodings for bignums for example. Looks, like it's a winner.

pfalcon avatar Jan 07 '18 12:01 pfalcon

CBOR tags are rather extensible, they are looking to incorporate fixed point types and arrays for ADCs. https://cbor-wg.github.io/array-tags/

hardkrash avatar Apr 23 '18 04:04 hardkrash

MsgPack has random gap in

Umm, no? 0xc0 through 0xc3 are None//False/True. CBOR also has gaps in it …

A more relevant advantage of CBOR is that you can prefix an item with a rather simple "use the following data as input to ‹class›__setstate__()" tag, where the class name is encoded in the tag. If you want to do the same thing with msgpack, you need either an in-memory copy of the object's encoded bytestring or two passes on the data structure __getstate() returns. Shorter tags could be used for more-common distinctions between e.g. tuple and list: just specify a "read only hint" tag, and possibly a "the following data is ordered" tag for OrderedDict.

Another advantage would be the ability to encode indeterminate-length data (this is basically impossible with msgpack), though I have no idea whether that is actually a relevant use case for micropython/pycopy.

smurfix avatar May 10 '21 17:05 smurfix