HtmlSanitizer
HtmlSanitizer copied to clipboard
Sanitizer does not remove comment but converts it to plain html
I use the latest version from nuget (not a beta version). When sanitizing the attached HTML it does not remove the comment that is between the javascript tags but for some reason it is converted to plain html.
What is your configuration? The HTML comment syntax used inside a script element does not create HTML comments but they become part of the script's text.
What do you mean with configuration? I don't understand that question.
The HTML is coming from an e-mail that is sent to us from a customer. We convert that e-mail to PDF but sanitize it before doing so.
Sorry, I should have been more clear. By configuration I mean how have you initialized the HtmlSanitizer
object, which elements have you allowed in AllowedTags
etc.
This is the code --> https://github.com/Sicos1977/ChromiumHtmlToPdf/blob/master/ChromiumHtmlToPdfLib/Helpers/DocumentHelper.cs
it starts at line 189 and this are the settings.
Sorry for the Dutch comments.
a minus sign means first remove everything and then add the rows below the sign an asterix ( * ) means use default settings and the lines after it means add those to the default settings
I can't reproduce. AFAICT you are using HtmlSanitizer in the default configuration (default allowed tags, attributes etc). In that configuration, the script
tag is disallowed and should be removed (including its content).
Can you provide a minimal example that shows the issue?
Sorry for the late response, I got side tracked by other things so I have to look into this again.