xsdata icon indicating copy to clipboard operation
xsdata copied to clipboard

dynamically lookup root element class

Open leo-b opened this issue 2 years ago • 7 comments

Currently the dataclass parser either uses a fixed root class type specified by the clazz parameter or it tries to guess the correct class.

Guessing doesn't work if the XML root element name differs from the xsi:type.

If the root element is not known in advance, a lookup function or a element-name - clazz mapping would be useful.

PR #766 adds that functionality to the dataclass XML parser.

leo-b avatar Mar 18 '23 11:03 leo-b

Hi @leo-b can you provide please some examples, I am not sure i get it.

tefra avatar Mar 18 '23 16:03 tefra

I am using a REST API that fetches different types of messages as XML. Depending on the type of the message, the root element might be one of several possible XSI types.

As I don't know which type of message will arrive next, I don't know the name of the XML root element that will be parsed. Thus I cannot specify one fixed corresponding dataclass to the parser.

E.g. the message might be one three different types:

  • class MessageTypeA which corresponds to a message with a XML root element of <messagea>...</messagea>
  • class MessageTypeB which corresponds to a message with a XML root element of <messageb>...</messageb>
  • class MessageTypeC which corresponds to a message with a XML root element of <messagec>...</messagec>

Auto-detection of the root class doesn't work as the XSI type name is not the same as the xml element name.

That's why I need either a possible root-element -> clazz mapping like

mapping = {
  '{http://mynamespace}messagea': mydataclasses.MessageTypeA,
  '{http://mynamespace}messageb': mydataclasses.MessageTypeB,
  '{http://mynamespace}messagec': mydataclasses.MessageTypeC,
}

... or a lookup function that does the translation:

As I already have a dataclass that defines the possible messages, I chose to use a lookup function that obtains the root element clazz from that dataclass:

@dataclass
class ContentType:
    class Meta:
        name = "contentType"
        namespace = "http://www.w3.org/2005/Atom"

    messagea: Optional[MessageTypeA] = field(
        default=None,
        metadata={
            "type": "Element",
            "namespace": "http://mynamespace",
            "required": True,
        }
    )
    messageb: Optional[MessageTypeB] = field(
        default=None,
        metadata={
            "type": "Element",
            "namespace": "http://mynamespace",
            "required": True,
        }
    )
    messagec: Optional[MessageTypeC] = field(
        default=None,
        metadata={
            "type": "Element",
            "namespace": "http://mynamespace",
            "required": True,
        }
    )

This lookup function could be used

def _get_feedcontent_class(qname, attrs, ns_map, parser):
    """obtains the class of the feed content root element
    by inspecting the fields of the ContentType class"""
    container_class = mydataclasses.ContentType
    meta = parser.context.fetch(mydataclasses.ContentType, None)
    for c in meta.find_children(qname):
        return c.clazz
    raise xsdata.exceptions.ParserError(
        f"Unable to obtain class for element {qname} from {mydataclasses.ContentType}")

Parsing of the XML messages could be done like that:

xsobj = xmlparser.from_bytes(response.content, mapping)
# or using the lookup function:
xsobj = xmlparser.from_bytes(response.content, _get_feedcontent_class)

leo-b avatar Mar 18 '23 22:03 leo-b

Can you provide some examples of the xml responses, xsdata already supports xsi:types on root elements and sub-elements.

<?xml version="1.0" encoding="UTF-8"?>
<test:e xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="test:B">
  <c>1</c>
  <d>2002-04-15</d>
</test:e>

tefra avatar May 28 '23 16:05 tefra

Re-open if you think this needs to be discussed more

tefra avatar Jul 16 '23 16:07 tefra

Sorry for the delay. I'd like to resume this old issue.

In my application I am parsing externally provided XML messages without knowing the root xml element in advance. Thus I cannot provide a clazz argument to the xsdata parser.

Currently, if no clazz argument is provided, the parser tries to guess a matching class from the current context. It checks the xsi:type attribute and also looks for a type matching the root element name in all imported modules containing dataclasses.

Infortunately in my case, the root element name doesn't match a type and it doesn't contain a xsi:type attribute. So I need to point xsdata to the correct starting point.

Can you provide some examples of the xml responses

I have attached the (externally provided) xsd schema and some possible xml messages: sample.zip The following snippet shows an example how to use my PR #766.

import xsdata
import dv
from xsdata.formats.dataclass.parsers import XmlParser
xmlparser = XmlParser()
xml = open("apistatus.xml", 'rb').read()

elementmap = {
  '{http://www.brz.gv.at/datenverbund-unis}apistatus': dv.ApiStatus,
  '{http://www.brz.gv.at/datenverbund-unis}kantwort': dv.Kontostandantwort,
  '{http://www.brz.gv.at/datenverbund-unis}mantwort': dv.Matrikelpruefungantwort,
}

# raises xsdata.exceptions.ParserError: No class found matching root: {http://www.brz.gv.at/datenverbund-unis}apistatus
xmlparser.from_bytes(xml)

# using a lookup map:
xmlparser.from_bytes(xml, elementmap)

leo-b avatar Jan 17 '24 14:01 leo-b

Hey @leo-b , the issue here is the xsd and how it doesn't set the correct namespace for the root elements correct?

I would rather avoid adding a hack like that in the codebase, the parser interface is very specific and clean. I wouldn't mind leaving a hook in place for people to extend the parser and override the default behavior, with your custom finder.

tefra avatar Jan 17 '24 14:01 tefra

Hey @leo-b , the issue here is the xsd and how it doesn't set the correct namespace for the root elements correct?

As far as I understand it, the namespace is set correctly in the XML and the XSD defines the type for the root element correctly. But the XML root element doesn't contain a xsi:type attribute that points to the type and the XSD doesn't contain an xs:element definition for the possible root element names.

But anyways, I don't want to rely on some guessing by looking at all imported modules to find the correct root class. I'd prefer being able to provide some custom code for that job.

I wouldn't mind leaving a hook in place for people to extend the parser and override the default behavior, with your custom finder.

A custom hook would also be fine, of course.

leo-b avatar Jan 17 '24 15:01 leo-b

The api of the parser is pretty stable feel free to extend it and add your custom logic.

tefra avatar Mar 09 '24 18:03 tefra

I'd like to revisit this. Is my understanding correct that all imported dataclass models are available for deserialization? What happens if my application is 2 different XML APIs that have an identical element name but different structures?

Ideally parsing could take a Union (similar to pydantic) to restrict the target types and also have the type annotation work.

jordanhamill avatar May 30 '24 09:05 jordanhamill