dynamically lookup root element class
Currently the dataclass parser either uses a fixed root class type specified by the clazz parameter or it tries to guess the correct class.
Guessing doesn't work if the XML root element name differs from the xsi:type.
If the root element is not known in advance, a lookup function or a element-name - clazz mapping would be useful.
PR #766 adds that functionality to the dataclass XML parser.
Hi @leo-b can you provide please some examples, I am not sure i get it.
I am using a REST API that fetches different types of messages as XML. Depending on the type of the message, the root element might be one of several possible XSI types.
As I don't know which type of message will arrive next, I don't know the name of the XML root element that will be parsed. Thus I cannot specify one fixed corresponding dataclass to the parser.
E.g. the message might be one three different types:
class MessageTypeAwhich corresponds to a message with a XML root element of<messagea>...</messagea>class MessageTypeBwhich corresponds to a message with a XML root element of<messageb>...</messageb>class MessageTypeCwhich corresponds to a message with a XML root element of<messagec>...</messagec>
Auto-detection of the root class doesn't work as the XSI type name is not the same as the xml element name.
That's why I need either a possible root-element -> clazz mapping like
mapping = {
'{http://mynamespace}messagea': mydataclasses.MessageTypeA,
'{http://mynamespace}messageb': mydataclasses.MessageTypeB,
'{http://mynamespace}messagec': mydataclasses.MessageTypeC,
}
... or a lookup function that does the translation:
As I already have a dataclass that defines the possible messages, I chose to use a lookup function that obtains the root element clazz from that dataclass:
@dataclass
class ContentType:
class Meta:
name = "contentType"
namespace = "http://www.w3.org/2005/Atom"
messagea: Optional[MessageTypeA] = field(
default=None,
metadata={
"type": "Element",
"namespace": "http://mynamespace",
"required": True,
}
)
messageb: Optional[MessageTypeB] = field(
default=None,
metadata={
"type": "Element",
"namespace": "http://mynamespace",
"required": True,
}
)
messagec: Optional[MessageTypeC] = field(
default=None,
metadata={
"type": "Element",
"namespace": "http://mynamespace",
"required": True,
}
)
This lookup function could be used
def _get_feedcontent_class(qname, attrs, ns_map, parser):
"""obtains the class of the feed content root element
by inspecting the fields of the ContentType class"""
container_class = mydataclasses.ContentType
meta = parser.context.fetch(mydataclasses.ContentType, None)
for c in meta.find_children(qname):
return c.clazz
raise xsdata.exceptions.ParserError(
f"Unable to obtain class for element {qname} from {mydataclasses.ContentType}")
Parsing of the XML messages could be done like that:
xsobj = xmlparser.from_bytes(response.content, mapping)
# or using the lookup function:
xsobj = xmlparser.from_bytes(response.content, _get_feedcontent_class)
Can you provide some examples of the xml responses, xsdata already supports xsi:types on root elements and sub-elements.
<?xml version="1.0" encoding="UTF-8"?>
<test:e xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="test:B">
<c>1</c>
<d>2002-04-15</d>
</test:e>
Re-open if you think this needs to be discussed more
Sorry for the delay. I'd like to resume this old issue.
In my application I am parsing externally provided XML messages without knowing the root xml element in advance. Thus I cannot provide a clazz argument to the xsdata parser.
Currently, if no clazz argument is provided, the parser tries to guess a matching class from the current context. It checks the xsi:type attribute and also looks for a type matching the root element name in all imported modules containing dataclasses.
Infortunately in my case, the root element name doesn't match a type and it doesn't contain a xsi:type attribute.
So I need to point xsdata to the correct starting point.
Can you provide some examples of the xml responses
I have attached the (externally provided) xsd schema and some possible xml messages: sample.zip The following snippet shows an example how to use my PR #766.
import xsdata
import dv
from xsdata.formats.dataclass.parsers import XmlParser
xmlparser = XmlParser()
xml = open("apistatus.xml", 'rb').read()
elementmap = {
'{http://www.brz.gv.at/datenverbund-unis}apistatus': dv.ApiStatus,
'{http://www.brz.gv.at/datenverbund-unis}kantwort': dv.Kontostandantwort,
'{http://www.brz.gv.at/datenverbund-unis}mantwort': dv.Matrikelpruefungantwort,
}
# raises xsdata.exceptions.ParserError: No class found matching root: {http://www.brz.gv.at/datenverbund-unis}apistatus
xmlparser.from_bytes(xml)
# using a lookup map:
xmlparser.from_bytes(xml, elementmap)
Hey @leo-b , the issue here is the xsd and how it doesn't set the correct namespace for the root elements correct?
I would rather avoid adding a hack like that in the codebase, the parser interface is very specific and clean. I wouldn't mind leaving a hook in place for people to extend the parser and override the default behavior, with your custom finder.
Hey @leo-b , the issue here is the xsd and how it doesn't set the correct namespace for the root elements correct?
As far as I understand it, the namespace is set correctly in the XML and the XSD defines the type for the root element correctly. But the XML root element doesn't contain a xsi:type attribute that points to the type and the XSD doesn't contain an xs:element definition for the possible root element names.
But anyways, I don't want to rely on some guessing by looking at all imported modules to find the correct root class. I'd prefer being able to provide some custom code for that job.
I wouldn't mind leaving a hook in place for people to extend the parser and override the default behavior, with your custom finder.
A custom hook would also be fine, of course.
The api of the parser is pretty stable feel free to extend it and add your custom logic.
I'd like to revisit this. Is my understanding correct that all imported dataclass models are available for deserialization? What happens if my application is 2 different XML APIs that have an identical element name but different structures?
Ideally parsing could take a Union (similar to pydantic) to restrict the target types and also have the type annotation work.