Cannot detect different root elements properly
A major limitation of serde-xml-rs right now seems to be that there is absolutely no way of parsing or limiting the root element’s name. As a practical example where this would be needed, consider the sitemap.xml format. Per the protocol and specification, sitemap.xml files may either be a proper sitemap file according to the sitemap XSD, or a siteindex file (basically a list of links to gzipped sitemaps for very large pages) according to the siteindex XSD. Parsing either one is fine, xsd-parser can generate valid code for them. But since the root element’s name cannot be specified, there’s absolutely no way of determining whether we have <siteindex> or <urlset>. Especially since xsd-parser generates #[serde(default)] attributes for the inner list fields, and serde-xml-rs will silently ignore unknown elements, without any ability to change this behavior, parsing one kind of document with the wrong schema will not lead to an error, just to an empty data structure. Removing the default attribute leads to errors, which still requires me to parse the XML twice. Even though I can pass an EventReader from xml-rs into serde-xml-rs, I cannot actually use that EventReader to detect the root element name manually, as this modifies the EventReader itself and makes it unusable to serde-xml-rs.
There are three solutions here:
- Allow using enums for the root element to distinguish between different root element names. This would be backwards-incompatible, as currently using enums here allows you to choose between different child element types within the root.
- Add a config option to the parser (or other configuration parts) making it only recognize a certain (set of) root element names. This feels a bit like a hack, and only really solves my specific problem (while still requiring two parses), but it’s backwards-compatible.
- Do not ignore unknown elements, but throw an error instead. This behavior seems like an oversight -- XML is considered a strict format, your Rust code is effectively an XSD, and XSD doesn’t allow unknown elements. I’d like a parser option to change this behavior, and (as a breaking change) make the default to error-out, as this seems like a footgun.
So I’d prefer (1) but that will break all current users, not very nice. Maybe the root element should be treated entirely differently from how it is currently treated.
(Note that since in my case, the XML Namespace is identical in both sitemap and siteindex, having a configuration option for only allowing certain xmlns declarations would not help with my problem, but might with others. I’m definitely for a similar parser option to limit allowed xmlns declarations.)
Thanks for reporting this.
For the moment, I'll probably explore option 1.
I just now stumbled over this problem (if I understood this correctly), where I had two possible responses, either
<?xml version="1.0" encoding="utf-8"?><result>...</result>
or
<?xml version="1.0" encoding="utf-8"?><error>...</error>.
As a workaround I did some prenormalization by inserting a common root element into the text:
<?xml version="1.0" encoding="utf-8"?><root><result>...</result></root>
and <?xml version="1.0" encoding="utf-8"?><root><error>...</error></<root>.
Now it was possible to Deserialize it with the existing means using an enum.
It's not beautiful, especially since removing and inserting the <?xml version="1.0" encoding="utf-8"?> is a bit cumbersome, but it works for now.
I am also encountering this issue, as I can receive two similar payloads, and want to dissociate between them.
Since I only have two possibilities I will do some matching on the input string to differenciate, but supporting enums sure would be nice.
@punkstarman do you have an idea on how to achieve this ? If I can find the time i'd gladly submit a PR
@charles-huet-ixxi,
Thanks for the offer of a PR. I'm open to suggestions on how best to do this.
As I mentioned in a previous comment, I am currently leaning towards @kleinesfilmroellchen's option 1:
Allow using enums for the root element to distinguish between different root element names. This would be backwards-incompatible, as currently using enums here allows you to choose between different child element types within the root.