pydantic-xml
pydantic-xml copied to clipboard
Modeling mappings as child elements?
Imagine I have the following xml:
<article>
<title>Hello</title>
<metadata>
<md_key_1>text_content_1</md_key_1>
<md_key_2>text_content_2</md_key_2>
...
</metadata>
</article>
That is, the metadata consists of a dynamic number of elements with dynamic tags and no attributes, each of which contains just text.
Ideally, I would want this to map to the Python model
class Article:
title: str
metadata: Dict[str, str]
is there a way to achieve that with pydantic-xml? The closest I got so far was by making metadata
a raw field, but then working from the Python side gets a little annoying: how do I construct an instance of Article when metadata
is ET.Element
? I could create a new constructor class method but then I'd have to remember that for this specific model only, I shouldn't use the constructor.
The other approach I expected to work was setting metadata=Field(exclude=True)
and implementing a @computed_element for serialization, and a @field_validator for deserialization. Unfortunately the @field_validator approach doesn't work:
class Article(BaseXmlModel, tag="article"):
title: str
metadata: Dict[str, str] = Field(exclude=True)
@field_validator('metadata', mode='before')
def decode_content(cls, value: Any) -> Optional[Dict[str, str]]:
print(value)
assert False
if __name__ == "__main__":
TEST_INPUT = """\
<article>
<title>Hello</title>
<metadata>
<md_key_1>text_content_1</md_key_1>
<md_key_2>text_content_2</md_key_2>
</metadata>
</article>
"""
Article.from_xml(TEST_INPUT)
prints
[line -1]: Assertion failed, [type=assertion_error, input_value={}, input_type=dict]
i.e. the validator receives an empty dict, not anything that could reconstruct the inner fields.
Is there a currently supported approach that I'm missing?
Thanks!
@zygi Hi,
Right now there is not way to model an element with dynamic tags. The workaround I see is the following:
from typing import Any
from lxml import etree
from pydantic_xml import BaseXmlModel, element
from pydantic import model_validator
class Article(BaseXmlModel, tag="article", arbitrary_types_allowed=True):
title: str = element()
metadata_raw: etree._Element = element(tag='metadata', default=None)
@property
def metadata(self) -> dict[str, str]:
return {el.tag: el.text for el in self.metadata_raw}
@model_validator(mode='before')
@classmethod
def set_metadata_raw(cls, data: dict) -> dict:
if metadata := data.pop('metadata', None):
data['metadata_raw'] = metadata_raw = etree.Element('metadata')
for tag, text in metadata.items():
sub = etree.SubElement(metadata_raw, tag)
sub.text = text
return data
if __name__ == "__main__":
TEST_INPUT = """\
<article>
<title>Hello</title>
<metadata>
<md_key_1>text_content_1</md_key_1>
<md_key_2>text_content_2</md_key_2>
</metadata>
</article>
"""
article = Article.from_xml(TEST_INPUT)
print(article)
print(article.metadata)
print(article.to_xml().decode())
article = Article(title='Hello', metadata={'md_key_1': 'text_content_1', 'md_key_2': 'text_content_2'})
print(article)
print(article.metadata)
print(article.to_xml().decode())
output:
title='Hello' metadata_raw=<Element metadata at 0x1057376c0>
{'md_key_1': 'text_content_1', 'md_key_2': 'text_content_2'}
<article><title>Hello</title><metadata>
<md_key_1>text_content_1</md_key_1>
<md_key_2>text_content_2</md_key_2>
</metadata>
</article>
title='Hello' metadata_raw=<Element metadata at 0x105811b80>
{'md_key_1': 'text_content_1', 'md_key_2': 'text_content_2'}
<article><title>Hello</title><metadata><md_key_1>text_content_1</md_key_1><md_key_2>text_content_2</md_key_2></metadata></article>