quick-xml
quick-xml copied to clipboard
Add documentation for mapping from XML to Rust used by deserializer
This is the my vision of further evolution of the serde integration in this crate. Some parts of this is discussible or maybe even impossible to implement -- this is the first iteration of what I would like to see. For now I'm making this draft PR to:
- share my vision
- invitation to discussion
- creating a roadmap for necessary fixes (marked with
FIXMEin doctests)
You can get a rustdoc documentation by running
cargo doc --features serialize --open
in the crate root an navigate to quick_xml::de module.
Below the (approximately) rendered version of this proposal:
Mapping XML to Rust types
Type names are never considered when deserializing, so you could name your types as you wish. Other general rules:
structfield name could be represented in XML only as attribute name or element name.enumvariant name could be represented in XML only as attribute name or element name.- the unit struct, unit type
()and unit enum variant can be deserialized from any valid XML content:- attribute and element names
- attribute and element values
- text or CDATA content
- when deserializing attribute names have precedence over element names. So if your XML have both attribute and element named equally, the Rust field/variant will be deserialized from the attribute.
NOTE: examples, marked with
FIXME:do not work yet -- any PRs that fixes that are welcome! The message after marker is a test failure message. Also, all that tests are marked with anignoreoption, although their compiles. This is by intention, because rustdoc marks such blocks with an exclamation mark unlikeno_runblocks.
| To parse all these XML's... | ...use that Rust type |
|---|---|
Root tag name do not matter
|
All these struct can be used to deserialize from specified XML depending on amount of information that you want to get:
A structure where each XML attribute or child element mapped to the field. Each attribute or element name becomes a name of field. Name of the struct itself does not matter.
|
An optional XML attributes/elements that you want to capture.
The root tag name do not matter.
|
A structure with an optional field.
When the XML attribute or element is present, type |
| Text content, CDATA content |
Text content and CDATA mapped to any Rust type that could be deserialized
from a string, for example,
|
An XML with different root tag names.
|
An enum where each variant have a name of the root tag. Name of the enum itself does not matter. All these types can be used to deserialize from specified XML depending on amount of information that you want to get:
You should have variants for all possible tag names in your enum or have
an |
|
|
Names of the enum, struct, and struct field does not matter.
|
A sequence with a strict order, probably with a mixed content
(text and tags).
|
All elements mapped to the heterogeneous sequential type: tuple or named tuple.
Each element of the tuple should be able to be deserialized from the nested
element content (
|
A sequence with a non-strict order, probably with a mixed content
(text and tags).
|
A homogeneous sequence of elements with a fixed or dynamic size.
|
A sequence with a strict order, probably with a mixed content,
(text and tags) inside of the other element.
|
A structure where all child elements mapped to the one field which have
a heterogeneous sequential type: tuple or named tuple. Each element of the
tuple should be able to be deserialized from the nested element content
(
|
A sequence with a non-strict order, probably with a mixed content
(text and tags) inside of the other element.
|
A structure where all child elements mapped to the one field which have
a homogeneous sequential type: array-like container. A container type
|
I'm not sure if my approach is wrong or if the following scenario should be added to the considerations. The comments within the code block outline what works vs what would be preferred in my case. It would be nice if the need for the "InnerNested" struct could be removed entirely since, in my case, it creates the need for an extra step to access the Vec containing the "InnerNestedDetail" structs which diverges from the standard in the XML.
use serde::{Deserialize, Serialize};
use quick_xml::de::{from_str, DeError};
#[derive(Debug, Deserialize, Serialize)]
struct InnerNestedDetail {
#[serde(rename="modification_date",default)]
modification_date: String, // String just for example
#[serde(rename="version",default)]
version: f32,
#[serde(rename="description",default)]
description: String,
}
// Prefer to not need this struct
#[derive(Debug, Deserialize, Serialize)]
struct InnerNested{
#[serde(rename = "Inner_Nested_Detail")] // not sure how this attribute's resulting functionality can be implemented in the inner nested field of the OuterTag struct
details: Vec<InnerNestedDetail>
}
#[derive(Debug, Deserialize, Serialize)]
pub struct OuterTag {
identifier: String,
version: f32,
#[serde(rename = "Inner_Nested")]
inner_nested: InnerNested, // Prefer to have this field as Vec<InnerNestedDetail> and remove the InnerNested struct altogether. The Inner_Nested_Detail cannot be accessed without knowing its outer tag, InnerNested, first, but this adds an extra step for accessing the data when the structs are populated
}
fn parse_xml(xml_string:&str) -> Result<OuterTag,DeError> {
let test: OuterTag = from_str(xml_string)?;
Ok(test)
}
fn main(){
let xml_string = "<Outer_Tag>
<identifier>Some Identifier</identifier>
<version>2.0</version>
<Inner_Nested>
<Inner_Nested_Detail>
<modification_date>2022-04-20</modification_date>
<version>1.0</version>
<description>Initial version.</description>
</Inner_Nested_Detail>
<Inner_Nested_Detail>
<modification_date>2022-05-20</modification_date>
<version>2.0</version>
<description>Modified version.</description>
</Inner_Nested_Detail>
</Inner_Nested>
</Outer_Tag>";
let result = parse_xml(xml_string).unwrap();
println!("{:#?}",result);
// In order to access the inner nested details vec, an extra step is necessary
println!("\nInner Nested Details Vec Current Access:\n{:#?}",result.inner_nested.details);
// Preferred vec access
//println!("\nInner Nested Details Vec Preferred Access:\n{:#?}",result.inner_nested);
}
'''
Because inner_nested field represents the Inner_Nested tag of your XML, it is not possible to just skip it in a trivial mapping. But you always can write a simple wrapper for use it with #[serde(with)] that will unpack sequence from the container: see that https://github.com/tafia/quick-xml/issues/365#issuecomment-1120253466
You can also look at the https://lib.rs/crates/serde-query if you looking for a more generic solution. I think I'll mention both alternatives in the final version of the doc.
@Mingun Is this PR still relevant?
Technically I'll open a new PR because I cannot change this one due to it is from the other repository and GitHub don't allow me to change it.