and other escape sequences are saved incorrectly when using XHTML mode
1. Description
[TestMethod]
public void OptionOutputAsXmlBugTest()
{
string html = @"Start| |<|>|&|€|£|"|'|End";
HtmlDocument htmlDocument = new HtmlDocument
{
OptionOutputAsXml = true,
};
htmlDocument.LoadHtml(html);
StringWriter stringWriter = new StringWriter(new StringBuilder(html.Length + 1000), CultureInfo.InvariantCulture);
htmlDocument.Save(stringWriter);
Assert.AreEqual("<?xml version=\"1.0\" encoding=\"utf-8\"?>Start| |<|>|&|€|£|"|'|End", stringWriter.ToString());
// Actual: <?xml version="1.0" encoding="utf-8"?>Start|&nbsp;|<|>|&|&euro;|&pound;|"|&apos;|End
}
As you can see is saved as &nbsp;. Same goes for other HTML escape sequences, but not all 🤪
- HAP version: 1.11.42
- NET version: .NET 6.0.1
Hello @Mertsch ,
This is expected since this is how a   is escaped in XML: https://www.freeformatter.com/xml-escape.html
There is indeed some change possible that we could do as discussed here: https://github.com/zzzprojects/html-agility-pack/issues/456 but if we talk purely XML, that is the right behavior.
Best Regards,
Jon
Sponsorship Help us improve this library
Performance Libraries
context.BulkInsert(list, options => options.BatchSize = 1000);
Entity Framework Extensions • Bulk Operations • Dapper Plus
Runtime Evaluation
Eval.Execute("x + y", new {x = 1, y = 2}); // return 3
C# Eval Function • SQL Eval Function
Hello @JonathanMagnan Thank you very much for your explanation and time.
I do understand now, that HTML & characters need to be escaped for XML. But as your link suggests shouldn't the the output be
Start|&nbsp;|&lt;|&gt;|&amp;|&euro;|&pound;|&quot;|&apos;|End
by &ing every & in the text?!
The linked issue #456 I do not fully understand. It seems there is the "backwards compatible" flag which specifically keeps  , but if it's about XML escaping ... why only some &s?
Hello @Mertsch ,
My bad, I just saw the part about the   of your initial post.
That OptionOutputAsXml is currently very confusing. I will look at it more deeply.
Best Regards,
Jon
I do not have this trouble, if I use HtmlDocument.BackwardCompatibility = false.
I have chosen to go with https://github.com/AngleSharp/AngleSharp and this issue is no longer relevant to me. If you want to close it, feel free to do so.
Hello @Mertsch ,
We will close this issue in this case. AngleSharp is a great library, so surely I understand your choice.
Best Regards,
Jon