CsQuery icon indicating copy to clipboard operation
CsQuery copied to clipboard

CQ chokes when xml declaration is missing encoding attribute

Open asinning opened this issue 10 years ago • 2 comments

When CsQuery tries to parse this xml using

CQ dom = xml;

<?xml version="1.0"?> <container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container"> <rootfiles> <rootfile full-path="OEBPS/content.opf" media-type="application/oebps-package+xml"/> </rootfiles> </container>

I get the following error:

System.NullReferenceException: Object reference not set to an instance of an object. Result StackTrace: at CsQuery.HtmlParser.ElementFactory.Parse(Stream inputStream, Encoding encoding) at CsQuery.HtmlParser.ElementFactory.Create(Stream html, Encoding streamEncoding, HtmlParsingMode parsingMode, HtmlParsingOptions parsingOptions, DocType docType) at CsQuery.CQ.CreateNew(CQ target, Stream html, Encoding encoding, HtmlParsingMode parsingMode, HtmlParsingOptions parsingOptions, DocType docType) at CsQuery.CQ..ctor(String html, HtmlParsingMode parsingMode, HtmlParsingOptions parsingOptions, DocType docType) at CsQuery.CQ.op_Implicit(String html)

I can eliminate the error by changing the xml declaration to include an encoding attribute:

<?xml version="1.0" encoding="UTF-8"?>

Thanks!

asinning avatar Oct 16 '14 16:10 asinning

In all honesty I haven't spent a lot of time trying to make CsQuery work as a general purpose XML parser. While it might work for some XML (XHTML) it may or may not handle generic XML properly in all cases, since XHTML is a subset of XML.

jamietre avatar Oct 21 '14 20:10 jamietre

I've written the following wrapper to fix the problem. It could stand to be made more robust.

private CQ GetCQ(string xml)
    {
        //  xml should really be trimmed first
        if (xml.IndexOf("<?xml") == 0)
        {
            if (xml.IndexOf(">") > 0)
            {
                var declaration = xml.Substring(0, xml.IndexOf("?>"));
                if (declaration.IndexOf("encoding") == -1)
                {
                    declaration = declaration + " encoding=\"UTF-8\"";
                    xml = declaration + xml.Substring(xml.IndexOf("?>"));
                }
            }
        }
        return new CQ(xml);
    }

asinning avatar Oct 22 '14 12:10 asinning