HtmlSanitizer icon indicating copy to clipboard operation
HtmlSanitizer copied to clipboard

Sanitizing xml with Body Tag

Open Ghyath-Serhal opened this issue 2 years ago • 1 comments

I am using HtmlSanitizer to sanitize the below xml data, that contain a body tag.

<?xml version="1.0" encoding="utf-8"?>
<Tag1 xmlns="urn:swift:saa:xsd:saa.2.0">
  <tag2>This is tag 2</tag2>
  <tag3>This is tag 3</tag3>
  <body>this is the body</body>
</Tag1>

I have added the tag1, tag2, tag3 and body to the AllowedTags attribute. I am getting the below result. As you can see the body tag is removed. I am just getting the data in the body tag.

<tag1 xmlns="urn:swift:saa:xsd:saa.2.0">
  <tag2>This is tag 2</tag2>
  <tag3>This is tag 3</tag3>
  this is the body
</tag1>

Ghyath-Serhal avatar Oct 11 '23 12:10 Ghyath-Serhal

HtmlSanitizer is only intended to sanitize HTML. When a fragment is passed to the Sanitize() method, it is wrapped in a body before it is parsed by AngleSharp's HTML parser. The additional body tag in the fragment is then dropped by the parser. I currently don't see a way around this. https://github.com/mganss/HtmlSanitizer/blob/28bdf0e0a1a143735a6be7858a38eaea772fcfef/src/HtmlSanitizer/HtmlSanitizer.cs#L386 You can try and experiment with the SanitizeDom() overload that takes an IHtmlDocument. You'd need to coerce AngleSharp into keeping the body element somehow.

In theory, you could also work with the AngleSharp.Xml package but the problem is that HtmlSanitizer makes extensive use of AngleSharp's IHtmlDocument interface so it would probably be hard to add support for XML.

I'm interested to hear what your use case is. Where's the XSS vector in your scenario?

mganss avatar Oct 12 '23 11:10 mganss