quick-xml
quick-xml copied to clipboard
Recognize and process some special XML attributes
All names starts with xml
(case-insensitive) are reserved by the XML standard, and some of them has special meaning. quick-xml could process some of them:
-
xml:lang
-- https://www.w3.org/TR/xml11/#sec-lang-tag (meta information about natural language of texts, stacked like namespace definitions) -
xml:space
-- https://www.w3.org/TR/xml11/#sec-white-space (related: #285) -
xsi:nil
-- map<element xsi:nil="true"/>
toNone
if deserialized toOption
it's only the concrete prefixes xml
and xmlns
that are globally defined: https://www.w3.org/TR/xml-names/#xmlReserved
The NamespaceResolver doesn't seem to have any entries by default. This seems incorrect by my reading of https://www.w3.org/TR/xml-names11/#xmlReserved. Shouldn't the xml
namespace be definitionally mapped to http://www.w3.org/XML/1998/namespace
?
It also looks like the xml
namespace should not be overrideable.
FWIW, xmlns also has a similar definitional mapping. However, given how xmlns is handled now in quick-xml, that probably doesn't need to be handled the same way. However, I am wondering if handling them the same way would make it more consistent to deal with reserved namespaces like this.
Shouldn't the
xml
namespace be definitionally mapped tohttp://www.w3.org/XML/1998/namespace
?
Yes, it should.
It also looks like the
xml
namespace should not be overrideable.
It seems that override it is technically possible, but such XML document is incorrect (but seems still well-formed and valid, but not namespace-well-formed because overriding violates namespace constrains).
Well-formed docs have to have a root element conforming to the description at https://www.w3.org/TR/xml11/#NT-element. This section says the following:
This specification does not constrain the application semantics, use, or (beyond syntax) names of the element types and attributes, except that names beginning with a match to (('X'|'x')('M'|'m')('L'|'l')) are reserved for standardization in this or future versions of this specification.
This seems to indicate that such attr names with xml at the front are reserved. AFAICT, this includes the namespace part of the attr name. Would this mean that have an attr like xmlfjdkslafjdsl=3
make the document not well-formed? Given that the doc has to be well-formed to be valid, wouldn't that also make the doc invalid?
Either way, I think this can be broken into two pieces.
- Initialize the name space resolver so that xml is already resolvable.
- If overriding the
xml
namespace should be blocked, do that also.
I think that we have agreement that 1 is worth doing now. I have an idea for that that I will generate a PR for. I will link it to this issue.
For 2, my opinion is that maybe we should not allow overriding xml namespace by default, but we should maybe have a flag that allows it. What do you think of that?
- Agreed, please make a PR
- I think we should check other popular XML libraries, and do in the similar way
I linked a PR to resolve the reserved namespaces.
Would this include things like xsi:type
as well?
In order to include it as an attribute in an element, I used #[serde(rename = "@xsi:type")]
which causes it to serialize correctly, e.g. <MyElement xsi:type="MyType">
However, deserialization of the same blob fails because
"missing field `@xsi:type`"
Looking more closely at the issue and the PR. I think my issue is unrelated. I'll add a ticket.