java-html-sanitizer
java-html-sanitizer copied to clipboard
SVG path markup bug and Changing default escapingMode on the HtmlStreamRenderer
The problem I'm trying to solve
I was trying to use this tool for the stripping all dangerous markup. One thing I'm trying to do though is allow the user to input custom {{ icon }} which I then use Handlebars to replace with something else. So the workflow is:
- User input
- Convert icon to svg markup using handlebars
- Pass the full input through the sanitizer
- Output to page.
Bug 1?
However I noticed that when allowingElement svg and path to the Policy the markup for paths break.
The sanitizer turns an svg and its path like this:
<svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" class="someclass">
<path id="bounds" opacity="0" d="M0 0h24v24H0z"/>
<path d="[svg content1]"/>
<path d="[svg content2]"/>
</svg>
into this:
<path d="M 100 100 L 300 100 L 200 300 z"
fill="red" stroke="blue" stroke-width="3" ></path>
<svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewbox="0 0 24 24" class="htc-icon htc-icon--checkmark_circle test">
--
<svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewbox="0 0 24 24" class="someclass">
<path id="bounds" opacity="0" d="M0 0h24v24H0z">
<path d="[svg content1]">
<path d="[svg content2]"></path>
</path>
</path>
</svg>
However, that isn't valid. Note how the paths get nested.
My code
I'm using the SlashdotPolicy as an example and made some changes:
new HtmlPolicyBuilder()
.allowStandardUrlProtocols()
.allowCommonInlineFormattingElements()
// Allow title="..." on any element.
.allowAttributes("title").globally()
// Allow href="..." on <a> elements.
.allowAttributes("href").onElements("a")
// Defeat link spammers.
.requireRelNofollowOnLinks()
// Allow lang= with an alphabetic value on any element.
.allowAttributes("lang").matching(Pattern.compile("[a-zA-Z]{2,20}"))
.globally()
// The align attribute on <p> elements can have any value below.
.allowAttributes("align")
.matching(true, "center", "left", "right", "justify", "char")
.onElements("p")
// These elements are allowed.
.allowElements(
"a", "p", "div", "i", "b", "em", "blockquote", "tt", "strong",
"br", "ul", "ol", "li")
// Allow the SVGs we're using
.allowElements("svg")
// Allow the svg Attributes
.allowAttributes("xmlns", "class", "width", "height", "viewbox")
.onElements("svg")
.allowElements("path")
// Allow the path Attributes
.allowAttributes("id", "opacity", "d").onElements("path")
.toFactory();
Bug 2
So because of how this was happening I swapped the order of my workflow:
- User input
- Pass the full input through the sanitizer
- Convert icon to svg markup using handlebars
- Output to page.
When i tried to sanitize first and convert the handlebars to my icons, the markup would change:
"{{" ---> "{<!-- -->{"
when looking at the tests, it looks like it is intentional. (https://github.com/OWASP/java-html-sanitizer/blob/34425c15adb7bd32609cefc01e8f8d4534a3cf34/src/test/java/org/owasp/html/EncodingTest.java#L231) That's ok, but it would be nice to change the default HtmlStreamRenderer.escapingMode of text.
My current solution
In my case I am using the following workflow:
- User input
- Pass the full input through the sanitizer
- Run a replace all on the sanitizer output
-
output.toString().replaceAll("\\{\\<!-- +--\\>\\{", "{{");
- Convert icon to svg markup using handlebars
- Output to page.
This isn't really ideal. But it will work for now.
I was trying to use this tool for the stripping all dangerous markup. One thing I'm trying to do though is allow the user to input custom {{ icon }} which I then use Handlebars to replace with something else.
I'm sure Mike will jump in when he has time, but my take is this is not the purpose of the tool. The HTML sanitizer is only meant to accept snippets of HTML markup for sanitization from tools like TinyMCE or from web features that allows users to include some HTML tags in content. This tool is not intended to sanitize per-rendered template markup that is not standard HTML. The HTML sanitizer also takes a fairly aggressive stance on parsing which focused primarily on safety. When it sees things that are not standard markup it delints and cleans up. It also has a fairly aggressive and safe perspective on handling HTML comments.
The SVG issue looks like a potential legit bug....
Anyhow, please give Mike some time to respond, he'll do so as soon as he can.
Aloha,
Jim
On 8/15/17 6:08 PM, dpena wrote:
The problem I'm trying to solve
I was trying to use this tool for the stripping all dangerous markup. One thing I'm trying to do though is allow the user to input custom {{ icon }} which I then use Handlebars to replace with something else. So the workflow is:
User input
Convert icon to svg markup using handlebars
Pass the full input through the sanitizer
Output to page.
Bug 1?
However I noticed that when allowingElement svg and path to the Policy the markup for paths break.
The sanitizer turns an svg and its path like this:
|<svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" class="someclass">
| into this:
|
However, that isn't valid. Note how the paths get nested.
My code
I'm using the SlashdotPolicy as an example and made some changes:
|new HtmlPolicyBuilder() .allowStandardUrlProtocols() .allowCommonInlineFormattingElements() // Allow title="..." on any element. .allowAttributes("title").globally() // Allow href="..." on elements. .allowAttributes("href").onElements("a") // Defeat link spammers. .requireRelNofollowOnLinks() // Allow lang= with an alphabetic value on any element. .allowAttributes("lang").matching(Pattern.compile("[a-zA-Z]{2,20}")) .globally() // The align attribute on
elements can have any value below. .allowAttributes("align") .matching(true, "center", "left", "right", "justify", "char") .onElements("p") // These elements are allowed. .allowElements( "a", "p", "div", "i", "b", "em", "blockquote", "tt", "strong", "br", "ul", "ol", "li") // Allow the SVGs we're using .allowElements("svg") // Allow the svg Attributes .allowAttributes("xmlns", "class", "width", "height", "viewbox") .onElements("svg") .allowElements("path") // Allow the path Attributes .allowAttributes("id", "opacity", "d").onElements("path") .toFactory(); |
Bug 2
So because of how this was happening I swapped the order of my workflow:
- User input
- Pass the full input through the sanitizer
- Convert icon to svg markup using handlebars
- Output to page.
When i tried to sanitize first and convert the handlebars to my icons, the markup would change:
"{{" ---> "{{"
when looking at the tests, it looks like it is intentional. (https://github.com/OWASP/java-html-sanitizer/blob/34425c15adb7bd32609cefc01e8f8d4534a3cf34/src/test/java/org/owasp/html/EncodingTest.java#L231) That's ok, but it would be nice to change the default HtmlStreamRenderer.escapingMode of text.
My current solution
In my case I am using the following workflow:
- User input
- Pass the full input through the sanitizer
- Run a replace all on the sanitizer output
- output.toString().replaceAll("{<!-- +-->{", "{{");
- Convert icon to svg markup using handlebars
- Output to page.
This isn't really ideal. But it will work for now.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/OWASP/java-html-sanitizer/issues/122, or mute the thread https://github.com/notifications/unsubscribe-auth/AAgcCaVv-HGVsB22P2eZf8kgkSollKPDks5sYhbFgaJpZM4O4LTO.
-- Jim Manico Manicode Security https://www.manicode.com
Right now, there's no handling of foreign XML contexts.
I think the simplest fix would be to add SVG to the table of element and attribute definitions.
I think MathML in HTML has been deprecated (see support), so I think I can
- add elements in https://www.w3.org/TR/SVG/eltindex.html to containment-checks and regenerate the tables.
- change the element stack to allow svg elements when an
<svg>
element is on the stack - change the renderer to remap tag names to element names when in
an SVG context.
Tag Name Element Name altglyph altGlyph ... ...
Hopefully this should fix bug 1 and obviate bug 2.
@mikesamuel Hi. I've ran into the same problem (bug 1) that @dpena , so I assume, that you didn't regenerate the tables. I've tried doing it myself - I added a new element to the containment-checks table and ran rebuild.sh, but didn't notice any change. Can you provide with a step by step guide for regenerating the tables, so people can add custom markup?
@wookie41, I haven't run that in a while. What did you try? I'd probably start by adding SVG elements like svg
, path
, etc. to
https://github.com/OWASP/java-html-sanitizer/blob/0c04e4be08a25ebc396784dce72c65a2b8f6665f/empiricism/html-containment.html#L45-L68
@mikesamuel
I added the elements to html-containment.html
and ran rebuild.sh
. It said that the scripts has finished successfully and that it copied the generated class to ../src/main/java/org/owasp/html/HtmlElementTablesCanned.java
, but I didn't see the elements that I added in it.
@mikesamuel up?
I recently attempted to use this to sanitize svgs. I also ran into bug 1
.
Hi - I have the same issue for sanitizing SVG - and I took some deeper look into bug 1
. For me it looks like empty-element-syntax (e.g. <polygon ... />
) is not correctly processed by this code:
https://github.com/OWASP/java-html-sanitizer/blob/main/src/main/java/org/owasp/html/HtmlSanitizer.java#L186
If the TAGEND-token content at this point is />
and not only >
, then a new flag needs to be set: directlyCloseTagAfterOpenTag = true
.
Then after https://github.com/OWASP/java-html-sanitizer/blob/main/src/main/java/org/owasp/html/HtmlSanitizer.java#L194 an if-block is needed checking the flag - if true
call directly receiver.closeTag(...)
and reset the flag.
This way empty-element-syntax always leads to openTag
and closeTag
called - otherwise only openTag
is called and this TagBalancing logic will call closeTag
at incorrect point.