capnpy icon indicating copy to clipboard operation
capnpy copied to clipboard

Add as_dict() method to struct

Open kawing-chiu opened this issue 7 years ago • 7 comments

Namedtuple has the method _asdict() and pycapnp also has to_dict(). I think we should also add something equivalent to capnpy, which promptly converts a struct into OrderedDict.

kawing-chiu avatar Nov 27 '17 07:11 kawing-chiu

what happens if the struct contains a unnamed union or it is a nested struct?

colinfang avatar Nov 28 '17 11:11 colinfang

My colleagues do sometimes find to_dict useful for simple plain struct. Currently we add the methods via _extended.py.

colinfang avatar Nov 28 '17 11:11 colinfang

I think that it's not so easy to design something which has a reasonable behavior w.r.t. all the possible combination of capnproto features. Some random thoughts:

  • unions: do we include all the fields, or just the one which is set? Do we also include a special which key?
  • Void fields? Do we include them or not?
  • Schema evolution: what do we do with fields which are not statically known at compile time? Just ignore them?
  • AnyPointer: how to deal with it?
  • groups: are they rendered as nested dicts, or using dotted keys?
  • NULL pointers: if we have a NULL Text, do we render it as None or ""?
  • default Text values: same as above, but in the case the field has a default value

I am sure that depending on the exact use case, you would need slightly different answers to the questions above. So, I am tempted to say that this feature should not be part of the capnpy core, at least for now. It would be nice to have it as an external library or plugin: then, as @colinfang says, you can easily integrate inside your schema using *_extended.py.

antocuni avatar Nov 28 '17 13:11 antocuni

Well...I wrote this without noticing your replies...

I haven't really used these advanced features of capnp yet. Will have a look tomorrow~

kawing-chiu avatar Nov 28 '17 13:11 kawing-chiu

I've investigated the issue a bit more, here are my thoughts:

First of all, this issue is not about converting the whole capnp data structure into native python types, but about "shallowly" converting to a dict, so nested struct is certainly not considered and most fields don't need to be rendered. Generally, I think such kind of thing cannot and should not be done. For example:

Object = namedtuple('Object', ['dimension', 'weight'])
Dimension = namedtuple('Dimension', ['x', 'y', 'z'])
o = Object(Dimension(10, 15, 20), 50)
o._asdict()

Will the nested Dimension be converted? No. But the user can always choose to do it himself. Another example:

from types import MappingProxyType
d = {'nested': {'a': 1}, 'b': 2}
m = MappingProxyType(d)

Will m['nested'] become MappingProxyType? No. But the user can choose to do it with one more line. Also note that namedtuple._asdict() is indifferent to what the type of the field is, it can be a cffi pointer or whatever.

Secondly, I don't see how *_extended.py can solve this issue easily. My schema has ~30 fields. Maybe I missed it, I couldn't find a way to get/iterate the field names easily. So to write equivalent methods in *_extended.py, I have to list all the fields manually, this is unacceptable, given that I have already written a .capnp file containing all the relevant information.

Handling data consists of (possibly nested) dict/list of primitive types should cover at least 90% usage of a serialization library (which is quite a conservative figure, I would say). I think an api as succinct as possilbe should be provided for such usage. In our app, the serialization layer has a fixed api: dict <-> bytes, while the serialization lib can be changed. We have tried quite a few libs, most can do the job in one or two line.

kawing-chiu avatar Nov 29 '17 03:11 kawing-chiu

Given the philosophy above, advanced fields that can normally be retrieved from attribute just works. For example group:

>>> mod = capnpy.load_schema('example_group')
>>> Point = mod.Point
>>> p = Point(position=(3, 4), color='red')
>>> p._fields
('position', 'color')
>>> p._asdict()
OrderedDict([('position', <Point.position: (x = 3, y = 4)>), ('color', b'red')])

named union:

>>> mod = capnpy.load_schema('example_named_union')
>>> Person = mod.Person
>>> p = Person(name='Bob', job=Person.Job(employer='Capnpy corporation'))
>>> p._fields
('name', 'job')
>>> p._asdict()
OrderedDict([('name', b'Bob'), ('job', <Person.job: (employer = "Capnpy corporation")>)])

There might be some corner cases left to be handled, most notably unnamed union. Even with a few exceptions, I think this feature is still very useful. The user can always choose to further process the data as needed.

kawing-chiu avatar Nov 29 '17 03:11 kawing-chiu

As for unnamed union, I propose two possible solutions:

  • Omit unamed union fields in _fields and _asdict(). This is the simplest one.

  • Include the currently set field in the union. This is the arguably more reasonable one, and more close to the definition of 'union'. For example:

@0x8ced518a09aa7ce3;
struct Shape {
  area @0 :Float64;
  union {
    circle @1 :Float64;      # radius
    square @2 :Float64;      # width
  }
}
>>> s = Shape(area=20, circle=5)
>>> s._fields
# ('area', 'circle')
>>> s._asdict()
# OrderedDict([('area', 20.0), ('circle', 5.0)])

Note that no matter which one is chosen, the user can always choose to process it further.

kawing-chiu avatar Nov 29 '17 06:11 kawing-chiu