CsQuery
CsQuery copied to clipboard
CQ chokes when xml declaration is missing encoding attribute
When CsQuery tries to parse this xml using
CQ dom = xml;
<?xml version="1.0"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
<rootfiles>
<rootfile full-path="OEBPS/content.opf" media-type="application/oebps-package+xml"/>
</rootfiles>
</container>
I get the following error:
System.NullReferenceException: Object reference not set to an instance of an object. Result StackTrace: at CsQuery.HtmlParser.ElementFactory.Parse(Stream inputStream, Encoding encoding) at CsQuery.HtmlParser.ElementFactory.Create(Stream html, Encoding streamEncoding, HtmlParsingMode parsingMode, HtmlParsingOptions parsingOptions, DocType docType) at CsQuery.CQ.CreateNew(CQ target, Stream html, Encoding encoding, HtmlParsingMode parsingMode, HtmlParsingOptions parsingOptions, DocType docType) at CsQuery.CQ..ctor(String html, HtmlParsingMode parsingMode, HtmlParsingOptions parsingOptions, DocType docType) at CsQuery.CQ.op_Implicit(String html)
I can eliminate the error by changing the xml declaration to include an encoding attribute:
<?xml version="1.0" encoding="UTF-8"?>
Thanks!
In all honesty I haven't spent a lot of time trying to make CsQuery work as a general purpose XML parser. While it might work for some XML (XHTML) it may or may not handle generic XML properly in all cases, since XHTML is a subset of XML.
I've written the following wrapper to fix the problem. It could stand to be made more robust.
private CQ GetCQ(string xml)
{
// xml should really be trimmed first
if (xml.IndexOf("<?xml") == 0)
{
if (xml.IndexOf(">") > 0)
{
var declaration = xml.Substring(0, xml.IndexOf("?>"));
if (declaration.IndexOf("encoding") == -1)
{
declaration = declaration + " encoding=\"UTF-8\"";
xml = declaration + xml.Substring(xml.IndexOf("?>"));
}
}
}
return new CQ(xml);
}