treexml-rs icon indicating copy to clipboard operation
treexml-rs copied to clipboard

UTF-8 BOM character at start of xml document treated as unexpected character

Open compenguy opened this issue 8 years ago • 0 comments

Pointing treexml at a UTF-8 xml document that starts with a BOM character causes this error: Error: 1:1 Unexpected characters outside the root element:

If I remove the BOM character, it works.

I replicated the issue by adding the following test to tests/read.rs:

        #[test]
        fn no_xml_tag_utf8bom() {

            let doc_raw = r#"\u{EF}\u{BB}\u{BF}
            <root>
                <child></child>
            </root>
            "#;

            let doc = Document::parse(doc_raw.as_bytes()).unwrap();

            assert_eq!(doc.version, XmlVersion::Version10);
            assert_eq!(doc.encoding, "UTF-8".to_owned());

        }

See the wikipedia article for more information on UTF-8 BOM characters (including the correct byte sequence).

compenguy avatar Dec 01 '17 17:12 compenguy