pycapnp icon indicating copy to clipboard operation
pycapnp copied to clipboard

robust way to serialize to json

Open aschmolck opened this issue 8 years ago • 2 comments

It is convenient in various scenarios to translate capnproto from and to json. The seemingly straightforward way to do soe would be to call json.dumps(builder.to_dict()), which is basically what capnp-json.py in scripts does. Unfortunately, that doesn't really work for Data fields, because json needs to be valid unicode and Data fields end up as bytes in the generated dictionary (see attached example code).

As far as I can tell capnproto itself does not yet have an official json serialization yet (which would include a specification how to encode Data fields as unicode), so maybe rather than having .to_json, having something like to_dict(bytes=False) which serializes Data to e.g. base64-encoded unicode and then always base64-decoding unicode for Data fields in new_message/from_dict might work.

import os

import capnp

if __name__ == '__main__':
    with open('test.capnp', 'w') as fh:
        fh.write('''
@0xc49a5731242fa476;
struct TestStruct {
    uint @0 :UInt64;
    blob @1 :Data;
}
''')
    schema = capnp.load('test.capnp')
    with open('test_ok.out', 'wb') as fh:
        schema.TestStruct.new_message(
            blob=b'valid utf8:\0\1\2"',
            uint=123,
        ).write(fh)
    with open('test_fail.out', 'wb') as fh:
        schema.TestStruct.new_message(
            blob=b'valid utf8:\0\1\2 invalid: \xc3\x28"',
        ).write(fh)
    #
    os.system('./capnp-json.py decode test.capnp TestStruct <test_ok.out ')
    os.system('./capnp-json.py decode test.capnp TestStruct <test_fail.out')

aschmolck avatar Jun 08 '16 10:06 aschmolck

capnproto (in git master) does have a JSON codec, though I'm not sure there's been a capnproto release since its addition (and I also don't think pycapnp wraps the functionality yet).

We canonically encode UInt64 and Int64 as strings of base-10 digits in ASCII, since Javascript's numeric type only affords 53 bits of precision.

We canonically encode Data as an array of numbers in the range [0, 255].

That said, I believe it is possible on the C++ side to provide your own encoding logic if you wanted to (for instance) base64 Data instead of making it an array of byte values.

zarvox avatar Jun 08 '16 18:06 zarvox

This sounds like a good idea, and I'm not opposed to structs having 'to_json/from_json' methods. Unfortunately I'm on vacation this week and I'm not sure when I'll be able to to find time to work on this.

I'm always happy to review PRs though :)

jparyani avatar Jun 09 '16 00:06 jparyani