openleadr-python icon indicating copy to clipboard operation
openleadr-python copied to clipboard

Explore alternatives to the XML-to-dict functionality

Open stan-janssen opened this issue 4 years ago • 2 comments

OpenLEADR now does XML parsing that is using xmltodict and a custom postprocessor (normalize_dict). This works well enough that we get very usable dictionaries out of it, and we can plug those same dictionaries into the jinja templating engine to generate perfectly valid XML output.

There is even support for constructing the objects using predefined dataclasses, which makes message creation a little nicer.

There are, however, a few shortcomings that could be addressed:

  • The XML parsing can only return dicts, not objects
  • There is a little too much thrashing going on to hammer the dicts into their preferred form. If we want to validate outgoing messages before sending them, we have to parse the jinja2-generated XML again which seems inefficient.
  • There are two XML parsing actions when validating the XML signature: one by xmltodict and another by lxml
  • We need a preprocessor (preflight_message) that validates object contents before XML serialization, which should maybe be handled inside objects.

In short, I'd like to transition to actual objects instead of dicts, and I'd like to explore ways of automating that process.

The avenues I can see are:

  • Use the lxml.objectify API
  • Use something like pyxmpp's XSO model.

The problems I see are:

  • I want to retain the simplified object form, where the double-nesting of certain properties, the inconsistency in encapusating lists of items into a separate identifier, and the treatment of namespaces. A 1:1 conversion of the XML structure to an objects structure is undesirable in my opinion.

stan-janssen avatar Nov 11 '20 15:11 stan-janssen

I used to deploy PyXB and generating/validating xml messages directly from xsd schema.

00javad00 avatar Jul 05 '21 06:07 00javad00

Have a look here, this contains various generated Python schemas to click around in. It uses xsdata to generate a Python schema from the XML schema, using various options.

https://github.com/sietse/compare-schema-generators

IIRC I was disappointed (in December when I tried this) in a few things in the generated schema. But I can only remember one:

  • a bunch of bare types getting wrapped in a class, e.g. a 'name' could just as easily be an alias for str or a NewType;

I'm pretty happy that there's a way to generate pydantic classes, though. Pydantic guarantees at runtime that the objects it creates conform to schema. It's also checkable with mypy. More guarantees, more better.

More commentary later. Might try other schema generators, too.

sietse avatar May 03 '22 10:05 sietse