quick-xml icon indicating copy to clipboard operation
quick-xml copied to clipboard

serde example in README doesn't work

Open turboladen opened this issue 4 years ago • 1 comments

I'm having some trouble parsing an XML file (I'm getting errors saying "duplicate field"), but noticed the serde example in the README seemed to be doing something similar, so I copied that code and put it into an integration test, but I can't seem to get that to pass either. I stripped out a bit of the XML to try to simplify, so I'm currently at this:

use quick_xml::de::from_str;
use serde::Deserialize;

#[derive(Debug, Deserialize, PartialEq)]
struct Link {
    rel: String,
    href: String,
    sizes: Option<String>,
}

#[derive(Debug, Deserialize, PartialEq)]
#[serde(rename_all = "lowercase")]
enum Lang {
    En,
    Fr,
    De,
}

#[derive(Debug, Deserialize, PartialEq)]
struct Head {
    title: String,
    #[serde(rename = "link", default)]
    links: Vec<Link>,
}

#[derive(Debug, Deserialize, PartialEq)]
struct Script {
    src: String,
    integrity: String,
}

#[derive(Debug, Deserialize, PartialEq)]
struct Body {
    #[serde(rename = "script", default)]
    scripts: Vec<Script>,
}

#[derive(Debug, Deserialize, PartialEq)]
struct Html {
    lang: Option<String>,
    head: Head,
    body: Body,
}

#[test]
fn crates_io() {
    let xml = r#"<!DOCTYPE html>
        <html lang="en">
          <head>
            <title>crates.io: Rust Package Registry</title>

            <link rel="manifest" href="/manifest.webmanifest">
            <link rel="apple-touch-icon" href="/cargo-835dd6a18132048a52ac569f2615b59d.png" sizes="227x227">
          </head>
          <body>
            <noscript>
                <div id="main">
                    <div class='noscript'>
                        This site requires JavaScript to be enabled.
                    </div>
                </div>
            </noscript>

            <script src="/assets/vendor-bfe89101b20262535de5a5ccdc276965.js" integrity="sha256-U12Xuwhz1bhJXWyFW/hRr+Wa8B6FFDheTowik5VLkbw= sha512-J/cUUuUN55TrdG8P6Zk3/slI0nTgzYb8pOQlrXfaLgzr9aEumr9D1EzmFyLy1nrhaDGpRN1T8EQrU21Jl81pJQ==" ></script>
            <script src="/assets/cargo-4023b68501b7b3e17b2bb31f50f5eeea.js" integrity="sha256-9atimKc1KC6HMJF/B07lP3Cjtgr2tmET8Vau0Re5mVI= sha512-XJyBDQU4wtA1aPyPXaFzTE5Wh/mYJwkKHqZ/Fn4p/ezgdKzSCFu6FYn81raBCnCBNsihfhrkb88uF6H5VraHMA==" ></script>

          </body>
        </html>
}"#;
    let html: Html = from_str(xml).unwrap();
    dbg!(html);
    assert_eq!(&html.head.title, "crates.io: Rust Package Registry");
    panic!("Just wanna see the dbg...");
}

This fails with:

thread 'crates_io' panicked at 'called `Result::unwrap()` on an `Err` value: Xml(EndEventMismatch { expected: "link", found: "head" })', tests/meow_tests.rs:70:22

Using Rust 1.43.1, quick-xml 0.18.1.

turboladen avatar May 14 '20 05:05 turboladen

I believe the example is wrong (it should use valid xml/xhtml not html).

The Serde implementation has check_end_names set to true on the Reader: https://github.com/tafia/quick-xml/blob/303003f94ce4114fc8c4e4d146d171b3f2cad2b7/src/de/mod.rs#L159

The parser expects the <link> tag to be closed by a matching </link> tag and panics when it finds </head> instead.

As for the "duplicate field" error it's hard to say without more information, but if you're trying to deserialize a sequence of elements that have different namespaces it could be #212.

Otherwise it could be https://github.com/RReverser/serde-xml-rs/issues/55. Despite that issue being on the serde-xml-rs repo, it's a general limitation of Serde's design so it applies to quick-xml's implementation as well.

Edit: Here's an issue for the Serde limitation from this rep: #177 An issue on the Serde repo mentioning quick-xml: https://github.com/serde-rs/serde/issues/1725

blankname avatar May 15 '20 22:05 blankname