pydantic-xml
pydantic-xml copied to clipboard
Mixed Content?
I'm not sure if this is a feature request, documentation request or a user question.
I have some XML like this:
<foo>
first text
<bar>second text</bar>
third text
</foo>
How can I model this? The ordering of the text is significant. So I basically need something like:
class Bar(BaseXmlModel):
body: str
class Foo(BaseXmlModel):
body: list[str | Bar]
But of course that doesn't work. What can I do?
@npmccallum Hi,
"first text" and "second text" can be extracted like this:
from pydantic_xml import BaseXmlModel
class Bar(BaseXmlModel, tag='bar'):
text: str
class Foo(BaseXmlModel, tag='foo'):
text: str
bar: Bar
foo = Foo.from_xml(xml)
assert foo.text == '\n first text\n '
assert foo.bar.text == 'second text'
Unfortunately element tails are not supported yet. The simplest solution right now to extract "third text" is using raw element:
from lxml.etree import _Element as Element
from pydantic_xml import BaseXmlModel, element
class Foo(BaseXmlModel, tag='foo', arbitrary_types_allowed=True):
text: str
bar: Element = element('bar')
@property
def bar_text(self):
return self.bar.text
@bar_text.setter
def bar_text(self, text: str):
self.bar.text = text
@property
def bar_tail(self):
return self.bar.tail
@bar_tail.setter
def bar_tail(self, tail: str):
self.bar.tail = tail
foo = Foo.from_xml(xml)
assert foo.text == '\n first text\n '
assert foo.bar_text == 'second text'
assert foo.bar_tail == '\n third text\n'
@dapper91 Thanks for the quick response. My real use case is significantly more complex than the simple one I gave. I have dozens of child tags that are interspersed with text. So I really need something like list[str | TypeOne | TypeTwo ... TypeN]
. Do you know how difficult this might be to implement?
@npmccallum I think it is possible to add support for element tails. The problem is that in xml parsers (etree, lxml) the tail text corresponds to a sub-element not to the root element, see. Considering your example the tail will be bound to Bar
, not to Foo
.
So the models will be described like this:
from pydantic_xml import BaseXmlModel
class Bar(BaseXmlModel, tag='bar'):
text: str
tail: str = tail()
class Foo(BaseXmlModel, tag='foo'):
text: str
bars: list[Bar]
foo = Foo.from_xml(xml)
assert foo.text == '\n first text\n '
assert foo.bars[0].text == 'second text'
assert foo.bars[0].tail == '\n third text\n'
assert foo.bars[1].text == 'fourth text'
assert foo.bars[1].tail == '\n fifth text\n'
# and so on
Will that be helpful?