xsdata
xsdata copied to clipboard
Ability to capture namespace mappings on parse, and render xml with the same mappings
Motivation
I have a project that needs to be able to ingest, process, and re-export xml documents using XSI instances. We don't know which xsi schemas will be in-use up-front.
The problem is that the xsi:type
properties come prefixed with the namespace tag (i.e. xsi:type="tns:ExtensionType"
) but parsing/serializing loses the namespace mapping, and namespaces are tagged ns0
, ns1
etc, which does not match tns
.
Proposed Solutions
There are 2 solutions I can think of that would allow round-tripping arbitrary XSI instances. The examples below assume the following class hierarchy.
@dataclass
class ExtensionPoint:
"""This class is intended to be extended by XSI instances"""
xsi_type: str = field(metadata=dict(
name="type",
type="Attribute",
namespace="http://www.w3.org/2001/XMLSchema-instance",
))
extra_elements: List[object] = field(metadata=dict(
type="Wildcard",
namespace="##other",
))
extra_attributes: Dict[str, str] = field(metadata=dict(
type="Attributes",
namespace="##other",
))
class MyXMLElement:
extension: List[ExtensionPoint] = field(metadata=dict(
type="Elements",
))
Change to XML Serializer/Deserializer API
This would create new parsing functions that return the namespace map alongside the parsed object to avoid a backward-incompatible change. this namespace map could then be passed to the renderer to correctly round-trip XML document.
parsed_obj, ns_map = xml_parser.from_string_with_namespaces(xml_string, MyXMLElement)
reformed_xml_string = xml_serializer.render(parsed_obj, ns_map=ns_map)
Change to XML Serializer/Deserializer Behaviour
This would allow the xml binding infrastructure to bind xml namespace information to the dataclasses, and consider this information when serializing the classes to xml.
@dataclass
class MyXMLDoc(MyXMLElement):
no_namespace: Optional[str] = field(
default=None,
metadata=dict(
name="xmlns",
type="Attribute",
namespace="##local",
),
)
namespaces: Dict[str, str] = field(
default_factory=dict,
metadata=dict(
type="Attributes",
namespace="##xmlns", # ??? is there a meta-namespace for xmlns: prefixed namespace declarations?
),
)
parsed_obj = xml_parser.from_string(xml_string, MyXMLDoc) # parsed_obj carries the xml namespace mapping directly
reformed_xml_string = xml_serializer.render(parsed_obj)
I'd be happy to work on an implementation of either of the above concepts, if @tefra could give an opinion on which approach to pursue?
Hi @twoolie
I thought I had replied to this 🤦, sorry about that.
The parser keeps a copy of the original namespace mapping.
obj = xml_parser.from_path(xml_fixture, Definitions)
result = xml_serializer.render(obj, ns_map=xml_parser.ns_map)
Hi @tefra, thanks for getting back to me. I had missed the existing ns_map property, that's very useful.
Having the map be a property of the parser makes me uncomfortable, as it would be mutated or replaced every time a new file (with potentially different ns mappings) is parsed. I'd still be happy to work on an alternative where the namespaces are returned with the parsed object if you are interested in accepting it.
Makes sense, sure give it a try.
Thanks for reporting @twoolie https://xsdata.readthedocs.io/en/latest/data_binding/xml_parsing/#capture-namespace-prefixes