treexml-rs
treexml-rs copied to clipboard
UTF-8 BOM character at start of xml document treated as unexpected character
Pointing treexml at a UTF-8 xml document that starts with a BOM character causes this error:
Error: 1:1 Unexpected characters outside the root element:
If I remove the BOM character, it works.
I replicated the issue by adding the following test to tests/read.rs:
#[test]
fn no_xml_tag_utf8bom() {
let doc_raw = r#"\u{EF}\u{BB}\u{BF}
<root>
<child></child>
</root>
"#;
let doc = Document::parse(doc_raw.as_bytes()).unwrap();
assert_eq!(doc.version, XmlVersion::Version10);
assert_eq!(doc.encoding, "UTF-8".to_owned());
}
See the wikipedia article for more information on UTF-8 BOM characters (including the correct byte sequence).