xsdata icon indicating copy to clipboard operation
xsdata copied to clipboard

Ability to capture namespace mappings on parse, and render xml with the same mappings

Open twoolie opened this issue 1 year ago • 4 comments

Motivation

I have a project that needs to be able to ingest, process, and re-export xml documents using XSI instances. We don't know which xsi schemas will be in-use up-front. The problem is that the xsi:type properties come prefixed with the namespace tag (i.e. xsi:type="tns:ExtensionType") but parsing/serializing loses the namespace mapping, and namespaces are tagged ns0, ns1 etc, which does not match tns.

Proposed Solutions

There are 2 solutions I can think of that would allow round-tripping arbitrary XSI instances. The examples below assume the following class hierarchy.

@dataclass
class ExtensionPoint:
    """This class is intended to be extended by XSI instances"""
    xsi_type: str = field(metadata=dict(
        name="type",
        type="Attribute",
        namespace="http://www.w3.org/2001/XMLSchema-instance",
    ))
    extra_elements: List[object] = field(metadata=dict(
        type="Wildcard",
        namespace="##other",
    ))
    extra_attributes: Dict[str, str] = field(metadata=dict(
        type="Attributes",
        namespace="##other",
    ))

class MyXMLElement:
     extension: List[ExtensionPoint] = field(metadata=dict(
        type="Elements",
    ))

Change to XML Serializer/Deserializer API

This would create new parsing functions that return the namespace map alongside the parsed object to avoid a backward-incompatible change. this namespace map could then be passed to the renderer to correctly round-trip XML document.

parsed_obj, ns_map = xml_parser.from_string_with_namespaces(xml_string, MyXMLElement)
reformed_xml_string = xml_serializer.render(parsed_obj, ns_map=ns_map)

Change to XML Serializer/Deserializer Behaviour

This would allow the xml binding infrastructure to bind xml namespace information to the dataclasses, and consider this information when serializing the classes to xml.

@dataclass
class MyXMLDoc(MyXMLElement):
    no_namespace: Optional[str] = field(
        default=None,
        metadata=dict(
            name="xmlns",
            type="Attribute",
            namespace="##local",
        ),
    )
    namespaces: Dict[str, str] = field(
        default_factory=dict,
        metadata=dict(
            type="Attributes",
            namespace="##xmlns",  # ??? is there a meta-namespace for xmlns: prefixed namespace declarations?
        ),
    )

parsed_obj = xml_parser.from_string(xml_string, MyXMLDoc)  # parsed_obj carries the xml namespace mapping directly
reformed_xml_string = xml_serializer.render(parsed_obj)

twoolie avatar Aug 29 '23 05:08 twoolie

I'd be happy to work on an implementation of either of the above concepts, if @tefra could give an opinion on which approach to pursue?

twoolie avatar Sep 25 '23 05:09 twoolie

Hi @twoolie

I thought I had replied to this 🤦, sorry about that.

The parser keeps a copy of the original namespace mapping.

obj = xml_parser.from_path(xml_fixture, Definitions)
result = xml_serializer.render(obj, ns_map=xml_parser.ns_map)

tefra avatar Sep 25 '23 06:09 tefra

Hi @tefra, thanks for getting back to me. I had missed the existing ns_map property, that's very useful.

Having the map be a property of the parser makes me uncomfortable, as it would be mutated or replaced every time a new file (with potentially different ns mappings) is parsed. I'd still be happy to work on an alternative where the namespaces are returned with the parsed object if you are interested in accepting it.

twoolie avatar Oct 19 '23 06:10 twoolie

Makes sense, sure give it a try.

tefra avatar Nov 04 '23 08:11 tefra

Thanks for reporting @twoolie https://xsdata.readthedocs.io/en/latest/data_binding/xml_parsing/#capture-namespace-prefixes

tefra avatar Mar 11 '24 15:03 tefra