java-html-sanitizer
java-html-sanitizer copied to clipboard
Allow comments to be preserved in HTML
It looks like the current implementation of the HTML sanitizer removes all comments, including hints for MSIE. Is there any reason why this isn't present?
Most processors for HTML documents have options to preserve comments, and in the code I see that the HtmlLexer does mark comments as a node, but the sanitizer ditches these as being unimportant.
We process templates, and use comments to mark out placeholder segments.
I'd expect an option to preserve HTML comments in the output.
Could you use tags as placeholders?
My current solution uses JSoup to replace comments with id'd tags, run owasp-html-sanitizer, then replacing the comments.
However, replacing the tags with JSoup doesn't work (it could lead to <commenttag id=1 /><html><head></head>...</html>
which gets mangled by the JSoup parser. So I use string replacement to get the comments back in their proper locations.
So the immediate problem is solved, but not pretty.
However, I still believe optionally keeping comments is a needed thing for a general purpose sanitizer. mso-
styling needs to be in comments.
Hi, Sorry for resurrecting an old post, but I need to do something similar. Could you elaborate on how to do this with the pre and post processors?
I have an implementation of this at the moment with a test trying to get comments from this HTML snippet:
<!-- comment 1 -->
<p>
<!-- comment 2-->Hello, <b><!-- comment 3-->World!</b>
</p>
But I do not see any of the comments being processed .
Here is my policy definition
anitizers.FORMATTING
.and(Sanitizers.STYLES)
.and(Sanitizers.BLOCKS)
.and(new HtmlPolicyBuilder()
.allowElements("html", "head", "body", "title", "style", "img")
.allowTextIn("style")
.allowUrlProtocols("https")
.allowCommonInlineFormattingElements()
.allowAttributes("src", "alt", "height", "width").onElements("img")
.allowAttributes("class").globally()
.withPreprocessor(
(HtmlStreamEventReceiver r) -> {
// not doing anything right now...
return new HtmlStreamEventReceiverWrapper(r) {
public void openTag(String elementName, List<String> attrs) {
this.underlying.openTag(elementName, attrs);
}
public void text(String text) {
this.underlying.text(text);
}
};
}
)
.withPostprocessor((HtmlStreamEventReceiver r) -> {
// not doing anything right now...
return new HtmlStreamEventReceiverWrapper(r) {
public void openTag(String elementName, List<String> attrs) {
this.underlying.openTag(elementName, attrs);
}
public void text(String text) {
this.underlying.text(text);
}
};
})
.toFactory());
Thanks for your help.
It would be useful to allow at least the conditional comments such as <!--[if IE 8]>, etc.
+1 This is a reasonable request.
-- Jim Manico @Manicode
On Feb 6, 2021, at 2:26 AM, DavesMan [email protected] wrote:
It would be useful to allow at least the conditional comments such as <!--[if IE 8]>, etc.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.