pydantic-xml icon indicating copy to clipboard operation
pydantic-xml copied to clipboard

`Any` no longer a valid field type in v2

Open Jacob-Flasheye opened this issue 1 year ago • 8 comments

(Disclaimer: This is my work account and I'm posting this on behalf of my $work.)

Hi.

I'm working on upgrading our code to v2, and the only pydantic_xml-related snag I've hit is that Any is no longer a valid type. I think I can work around this by using generics, but I'm curious as to why their support has been removed. Could you explain why Any is no longer a valid field type and if there are any plans to make it a valid field type again?

Jacob-Flasheye avatar Aug 22 '23 14:08 Jacob-Flasheye

@Jacob-Flasheye Hi.

Could you please provide some examples of how you use Any typed fields? The reason it has been removed is that it is not possible to analyze the field type during building the model serializer and choose the correct serializer type (scalar, collection, mapping, ...).

In v1 Any typed fields were interpreted as scalar typed ones (pydantic 1 did that and the library relied on it) and raised an exception during serializaiton if they wasn't which lead to some unexpected behavior. For example:

class Model(BaseXmlModel):
    field: Any = element()

M(field=1).to_xml()  # Ok
M(field=[1, 2, 3]).to_xml()  # Error

whereas

class Model(BaseXmlModel):
    field: List[int] = element()

M(field=[1, 2, 3]).to_xml()  # Ok

dapper91 avatar Aug 22 '23 19:08 dapper91

Thanks for the quick reply, appreciate your work :pray:

The problem at hand is generating pydantic_xml models from xml schema files, where I don't control the files. Some of the types in the schema contain my worst enemies: xs:any and xs:anyAttribute. Here is perhaps the most pathological example:

<xs:complexType name="AnyHolder">
	<xs:sequence>
		<xs:any namespace="##any" processContents="lax" minOccurs="0" maxOccurs="unbounded"/> 
	</xs:sequence>
	<xs:anyAttribute processContents="lax"/>
</xs:complexType>

In v1 I could do as you show in your first example, and while serialization errors out, I could still instantiate the model and do the serialization by hand. The problem is that I now cannot instantiate the object. I totally agree with you that the current behaviour is the better in 99% of circumstances.

What I was thinking (without knowing any of the pydantic internals) is that the serialization of an Any field could be special-cased to first look up the type of the field and then serialize it? But reading your reply I take it that such special casing or deferral is impossible.

I'm currently leaning towards representing the model as

class AnyHolder(Generic[ElemT, AttrT], BaseModel):  # I will admit I haven't looked up the specifics of generic models in pydantic v2
    any_elem: list[ElemT] = element()
    any_attr: AttrT = attr()

but technically that model is wrong (ElemT needs to be a TypeVarTuple, which requires support first from pydantic and then here, none of which are guarantees.) and I'm not sure that I cannot special case the code using the generated models.

One solution that would work well here (I think) is the suggestion to support raw xml in #14. I think that would cover xs:any but it might still not deal with xs:anyAttribute. Also, I'm not sure how technically feasible that is.

Sorry for the ranting nature of this post, I'm not frustrated with pydantic_xml (I love it!), but I am a bit frustrated with the xml schema we're working with... Please tell me if you need any more information!

Jacob-Flasheye avatar Aug 23 '23 06:08 Jacob-Flasheye

@Jacob-Flasheye Hi,

Thanks for the thorough answer!

I am working on raw xml fields right now but I am not sure it will solve your pain with xs:any. You could define your model like this:

class AnyHolder(BaseXmlModel): 
    any_elem: list[etree.Element] = element()

but it will not fit your schema since the sequence element may have any tag.

Speaking of xs:anyAttribute I am not sure any_attr: AttrT = attr() definition is correct. As far as I know xs:anyAttribute means the element may have any number of attributes with any name. In your model only one attribute is allowed. So it seems to me this definition is more accurate:

class AnyHolder(BaseXmlModel):
    any_attrs: Dict[str, str]

Although raw xml still could help you. Is AnyHolder a root element in your schema? I will explain what I am getting at. It seems to me AnyHolder is weakly defined and it is not possible to define a model for that. Maybe you could use raw xml instead of the model itself:

class AnyHolder(BaseXmlElement, tag='AnyHolder'):
    ...

class OuterModel(BaseXmlModel, arbitrary_types_allowed=True):
    any_holder_raw: etree.Element = element(tag='AnyHolder', exclude=True)

    @computed_element
    def any_holder(self) -> AnyHolder:
        # manual parsing here

Will that help you to deal with AnyHolder?

dapper91 avatar Aug 23 '23 15:08 dapper91

Wow, I didn't know anyAttribute also meant any number of attributes, thanks for informing me.

AnyHolder was really just the first example I could find but your suggestions still hold true for most other cases (in fact, AnyHolder is not referenced anywhere else in the schema...). But I agree that your posted solution would most likely help if I need to deal with AnyHolder or similar elements!

My current solution looks like this:

  • Anything in the schema with type="any"/"anySimple" and the like get a T_i = TypeVar("T_i") type. That enables me to initialize the models and everything works as it did before! (This is actually what caused my problems no the xs:any elements, but I get confused by xml schema.)
  • Ignore xs:any and xs:anyAttribute fields until they cause a problem. If they do I will try hard to make things work and worst case I might comment on this issue again with my findings for further discussion.

I think this issue can be closed, it seems you are aware of the problems I've mentioned and you've carefully given suggestions on what I can do. Now it's up to me to use those suggestions!

Jacob-Flasheye avatar Aug 24 '23 07:08 Jacob-Flasheye

There's one more thing to this that I hadn't thought of until I just now started generating xmlschema elements, and that is that they can have any name. IIUC pydantic_xml currently requires you to specify the exact name of the elements. Would it be possible to add some mechanism to capture all elements that aren't assigned irrespective of their name. I don't know exactly how that would work internally but it could be set through a class argument, with the elements accessed through some name like any_:

class CaptureUnassignedElements(BaseXmlModel, allow_arbitrary_types=True, capture_unassigned_elements=True):
    test: int = element("Test")

input_str = """
    <CaptureUnassignedElements>
        <test>23</test>
        <abc123>test</abc123>
        <empty/>
    </CaptureUnassignedElements>
"""

cua_test = CaptureUnassignedElements.from_xml(input_str)

print(cua_test.test)  # prints 23
print(cua_test.any_)  # print [lxml.etree._Element(...), lxml.etree._Element(...)] or something similar

I'm not sure if this is good design, nor how hard it is to do, but it does reflect a real use case caused by xmlschema so I think it should at least be considered.

Jacob-Flasheye avatar Dec 18 '23 14:12 Jacob-Flasheye

This is something that I'm bumping into as well, that I understand is frankly against the nature and benefit of these strongly defined models, but is a use case I'm nevertheless having to support.

The schema definition for the element is:

<xs:element name="jobInfo" maxOccurs="1" minOccurs="0">
    <xs:annotation>
        <xs:documentation> This is arbitrary information that can be added to the job description by
            the UWS implementation. </xs:documentation>
    </xs:annotation>
    <xs:complexType>
        <xs:sequence>
            <xs:any namespace="##any" processContents="lax" minOccurs="0" maxOccurs="unbounded" />
        </xs:sequence>
    </xs:complexType>
</xs:element>

which, in practice, might look like:

<uws:jobInfo>
  <any>
    <xml>
      <thatyouwant />
    </xml>
  </any>
</uws:jobInfo>

which I'm struggling to support.

I've documented some more of this in the issue tracker for the project here: https://github.com/spacetelescope/vo-models/issues/18

jwfraustro avatar Mar 22 '24 14:03 jwfraustro

@Jacob-Flasheye I also need to generate models from XSD files (which you described in https://github.com/dapper91/pydantic-xml/issues/100#issuecomment-1689387730) so would be interested in collaborating here.

@dapper91 would this be something that you would be willing to integrate? I know that XSD can be thorny, but that feature would be a natural fit for pydantic-xml.

edit: Could we maybe parially support https://docs.pydantic.dev/latest/concepts/models/#dynamic-model-creation in pydantic-xml so that we could use e.g. https://xmlschema.readthedocs.io/en/latest/usage.html#meta-schemas-and-xsd-sources for dynamic model creation? This task alone would not need support for serializing (e.g. https://github.com/dapper91/pydantic-xml/issues/92) so might be feasible?

fleimgruber avatar Mar 26 '24 14:03 fleimgruber

I added dynamic model creation experimental support in 2.10.0.

dapper91 avatar May 08 '24 19:05 dapper91