capnpy
capnpy copied to clipboard
Add as_dict() method to struct
Namedtuple has the method _asdict()
and pycapnp also has to_dict()
. I think we should also add something equivalent to capnpy, which promptly converts a struct into OrderedDict.
what happens if the struct
contains a unnamed union
or it is a nested struct
?
My colleagues do sometimes find to_dict
useful for simple plain struct
. Currently we add the methods via _extended.py
.
I think that it's not so easy to design something which has a reasonable behavior w.r.t. all the possible combination of capnproto features. Some random thoughts:
- unions: do we include all the fields, or just the one which is set? Do we also include a special
which
key? -
Void
fields? Do we include them or not? - Schema evolution: what do we do with fields which are not statically known at compile time? Just ignore them?
-
AnyPointer
: how to deal with it? - groups: are they rendered as nested dicts, or using dotted keys?
- NULL pointers: if we have a NULL
Text
, do we render it asNone
or""
? - default
Text
values: same as above, but in the case the field has a default value
I am sure that depending on the exact use case, you would need slightly different answers to the questions above. So, I am tempted to say that this feature should not be part of the capnpy
core, at least for now.
It would be nice to have it as an external library or plugin: then, as @colinfang says, you can easily integrate inside your schema using *_extended.py
.
Well...I wrote this without noticing your replies...
I haven't really used these advanced features of capnp yet. Will have a look tomorrow~
I've investigated the issue a bit more, here are my thoughts:
First of all, this issue is not about converting the whole capnp data structure into native python types, but about "shallowly" converting to a dict, so nested struct is certainly not considered and most fields don't need to be rendered. Generally, I think such kind of thing cannot and should not be done. For example:
Object = namedtuple('Object', ['dimension', 'weight'])
Dimension = namedtuple('Dimension', ['x', 'y', 'z'])
o = Object(Dimension(10, 15, 20), 50)
o._asdict()
Will the nested Dimension
be converted? No. But the user can always choose to do it himself. Another example:
from types import MappingProxyType
d = {'nested': {'a': 1}, 'b': 2}
m = MappingProxyType(d)
Will m['nested']
become MappingProxyType
? No. But the user can choose to do it with one more line. Also note that namedtuple._asdict()
is indifferent to what the type of the field is, it can be a cffi pointer or whatever.
Secondly, I don't see how *_extended.py
can solve this issue easily. My schema has ~30 fields. Maybe I missed it, I couldn't find a way to get/iterate the field names easily. So to write equivalent methods in *_extended.py
, I have to list all the fields manually, this is unacceptable, given that I have already written a .capnp
file containing all the relevant information.
Handling data consists of (possibly nested) dict/list of primitive types should cover at least 90% usage of a serialization library (which is quite a conservative figure, I would say). I think an api as succinct as possilbe should be provided for such usage. In our app, the serialization layer has a fixed api: dict
<-> bytes
, while the serialization lib can be changed. We have tried quite a few libs, most can do the job in one or two line.
Given the philosophy above, advanced fields that can normally be retrieved from attribute just works. For example group
:
>>> mod = capnpy.load_schema('example_group')
>>> Point = mod.Point
>>> p = Point(position=(3, 4), color='red')
>>> p._fields
('position', 'color')
>>> p._asdict()
OrderedDict([('position', <Point.position: (x = 3, y = 4)>), ('color', b'red')])
named union
:
>>> mod = capnpy.load_schema('example_named_union')
>>> Person = mod.Person
>>> p = Person(name='Bob', job=Person.Job(employer='Capnpy corporation'))
>>> p._fields
('name', 'job')
>>> p._asdict()
OrderedDict([('name', b'Bob'), ('job', <Person.job: (employer = "Capnpy corporation")>)])
There might be some corner cases left to be handled, most notably unnamed union
. Even with a few exceptions, I think this feature is still very useful. The user can always choose to further process the data as needed.
As for unnamed union
, I propose two possible solutions:
-
Omit unamed union fields in
_fields
and_asdict()
. This is the simplest one. -
Include the currently set field in the union. This is the arguably more reasonable one, and more close to the definition of 'union'. For example:
@0x8ced518a09aa7ce3;
struct Shape {
area @0 :Float64;
union {
circle @1 :Float64; # radius
square @2 :Float64; # width
}
}
>>> s = Shape(area=20, circle=5)
>>> s._fields
# ('area', 'circle')
>>> s._asdict()
# OrderedDict([('area', 20.0), ('circle', 5.0)])
Note that no matter which one is chosen, the user can always choose to process it further.