pydantic-xml
pydantic-xml copied to clipboard
Cannot serialize non-generic model with generic model inside
You currently cannot serialize a non-generic model with a generic model inside of it.
While to_xml()
doesn't work, model_dump_json()
still works, so it doesn't seem to be a inherent problem with serialization, rather a limitation of the current xml serialization implementation.
Example model:
from pydantic_xml import BaseXmlObject, element
from Typing import Generic, TypeVar
T = TypeVar("T")
class GenericModel(BaseXmlModel, Generic[T]):
generic: T_0 = element("GenericElement")
class NonGenericWrapper(BaseXmlModel):
wrapped_generic: GenericModel = element("WrappedGeneric")
Testing model_dump_json()
and to_xml()
>>> NonGenericWrapper(wrapped_generic=GenericModel[int](generic=10)).model_dump_json()
'{"wrapped_generic":{"generic":10}}'
>>> NonGenericWrapper(wrapped_generic=GenericModel[int](generic=10)).to_xml()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/root/.local/lib/python3.10/site-packages/pydantic_xml/model.py", line 399, in to_xml
return etree.tostring(self.to_xml_tree(skip_empty=skip_empty), **kwargs)
File "/root/.local/lib/python3.10/site-packages/pydantic_xml/model.py", line 380, in to_xml_tree
self.__xml_serializer__.serialize(
File "/root/.local/lib/python3.10/site-packages/pydantic_xml/serializers/factories/model.py", line 170, in serialize
field_serializer.serialize(
File "/root/.local/lib/python3.10/site-packages/pydantic_xml/serializers/factories/model.py", line 366, in serialize
assert self._model.__xml_serializer__ is not None, f"model {self._model.__name__} is partially initialized"
AssertionError: model GenericModel is partially initialized
I think this limitation occurs because the serialization was changed to no longer accept the Any
type, which in many cases can be replaced with raw fields, but in many cases it shouldn't have to be. The reason I can't make the NonGenericWrapper
generic is because in my use case the NonGenericWrapper.wrapped_generic
field is actually a sequence of elements each of which can have a separate generic instantiation, thus requiring generic variadics (which might still not work because the model is quite complex).
I think the choice to reject serialization of Any
should be reconsidered, at least if we want parity with core pydantic serialization. Although I realize that the removal of Any
serialization was probably done for a good reason and might be complicated to implement.
@ajoino Hi,
Due to xml complexity the model xml serializer/deserializer is built during model definition, not model instantiation that is why you get the error: model GenericModel is partially initialized
. So any generic model arguments should be defined before the first instantiation.
The problem with Any
is that it is not obvious which type will be behind it. It could be primitive type (and that ok), but it also could be model, collection or raw type and that brings us back to the first problem: the serializer is build during model definition, but we don't know the exact type. That's why raw elements should be used instead of it.
Considering your problem, do you know the wrapped_generic
sequence allowed types beforehand? Maybe Union
can solve your problem:
>>> from pydantic_xml import BaseXmlModel, element
>>> from typing import Generic, TypeVar, List, Union
>>>
>>> T = TypeVar("T")
>>>
>>> class GenericModel(BaseXmlModel, Generic[T]):
... generic: T = element("GenericElement")
...
>>> class NonGenericWrapper(BaseXmlModel):
... wrapped_generic: List[Union[
... GenericModel[int],
... GenericModel[float],
... GenericModel[str],
... ]] = element("WrappedGeneric")
...
>>> wrp = NonGenericWrapper(
... wrapped_generic=[
... GenericModel[int](generic=10),
... GenericModel[float](generic=10.5),
... GenericModel[str](generic='text'),
... ]
... )
>>> print(wrp.to_xml(pretty_print=True).decode())
<NonGenericWrapper>
<WrappedGeneric>
<GenericElement>10</GenericElement>
</WrappedGeneric>
<WrappedGeneric>
<GenericElement>10.5</GenericElement>
</WrappedGeneric>
<WrappedGeneric>
<GenericElement>text</GenericElement>
</WrappedGeneric>
</NonGenericWrapper>
I've slept on this and I think I was perhaps a bit hasty suggesting that the current serialization scheme should be revisited, and I should have instead tried to approach this differently. I have created a union type in Python that covers the xsd:anySimpleType
type, but I'm still struggling a bit with the xsd:anyType
and the dreaded xsd:any
. The problem with these two is that if I have an element like this
<xsd:complexType name="ContainsAnyStuff">
<xsd:sequence>
<xsd:element name="ThisIsProblematic" type="xsd:anyType" minOccurs="1" maxOccurs="unbounded"/>
</xsd:sequence>
<xsd:any namespace="##any"/>
</xsd:complexType>
I don't even know what tag will be used for the xsd:any
elements, so pydantic-xml cannot model these element currently AFAIK.
As you've said, the ThisIsProblematic
element can probably we well covered by a combination of raw elements and type unions.
Regarding xsd:any
elements, to model such elements, we need some sort of escape hatch that will capture all elements of an xml document that are not explicitly part of the pydantic-xml model, here's one possible API:
class CaptureUnassignedElements(BaseXmlModel, allow_arbitrary_types=True, capture_unassigned_elements=True):
test: int = element("Test")
input_str = """
<CaptureUnassignedElements>
<test>23</test>
<abc123>test</abc123>
<empty/>
</CaptureUnassignedElements>
"""
cua_test = CaptureUnassignedElements.from_xml(input_str)
print(cua_test.test) # prints 23
print(cua_test.any_) # print [lxml.etree._Element(...), lxml.etree._Element(...)] or something similar
Essentially, by giving a model a specific flag at construction it will capture all elements that aren't in the model description and make the accessable through a special attribute, name TBD.
In the proposal above it's a list, but it could perhaps be an etree.Element
/etree._Element
that contains the captured elements.
It can be further restricted by requiring that the xsd:any
elements come last in a xsd:sequence
to make it easier for the parser, that is at least how I've seen it used the majority of the time.
What do you think about this proposal?
@ajoino Hi,
I am not sure defining any_
phantom field is a very intuitive approach. Furthermore user can define any_
field in a model by himself which could lead to some non-obvious errors. Maybe defining a special method to handle unbound xml entities will be more intuitive.
What do you think about this:
class CaptureUnassignedElements(BaseXmlMode):
test: int = element()
@unbound_handler()
def handle_unbound(self, text: Optional[str], attrs: Optional[Dict[str, str]], elements: List[ElementT]) -> None:
"""
:param text: element text if unbound
:param attrs: unbound element attributes
:param elements: unbound sub-elements
"""
assert elements[0].tag == 'abc123'
assert elements[0].text == 'test'
assert elements[1].tag == 'empty'