java-html-sanitizer
java-html-sanitizer copied to clipboard
Discrepancies of results when sanitizing allowed tags
While trying to use .allowElements(), I noticed some surprising results: different tags yield different kind of output.
For example:
- A policy with
HtmlPolicyBuilder().allowElements("a").toFactory().sanitize("<a>")returns nothing; - A policy with
HtmlPolicyBuilder().allowElements("div").toFactory().sanitize("<div>")returns<div></div>.
I expected 1. not to be empty, but also the two results to be consistent...
Am I missing something, am I doing anything wrong?
With the following (trimmed) code, I tested many elements to expose their behaviour.
import java.util.Arrays;
import javax.swing.text.html.HTML;
import org.owasp.html.PolicyFactory;
import org.owasp.html.HtmlPolicyBuilder;
// [...]
String[] HTML_ELEMENTS = Arrays
.stream(HTML.getAllTags())
.map(Object::toString)
.toArray(String[]::new);
// [...]
for (int i=0; i<HTML_ELEMENTS.length; i++) {
String element = HTML_ELEMENTS[i];
String sanitized = HtmlPolicyBuilder()
.allowElements(element)
.toFactory()
.sanitize("<" + element + ">");
System.out.println(element + ": " + sanitized);
}
And I got the following list, where several other tags behave like a:
a:
address: <address></address>
applet: <applet></applet>
area: <area />
b: <b></b>
base: <base />
basefont: <basefont />
big: <big></big>
blockquote: <blockquote></blockquote>
body: <body></body>
br: <br />
caption: <caption></caption>
center: <center></center>
cite: <cite></cite>
code: <code></code>
dd: <dd></dd>
dfn: <dfn></dfn>
dir: <dir></dir>
div: <div></div>
dl: <dl></dl>
dt: <dt></dt>
em: <em></em>
font:
form: <form></form>
frame: <frame></frame>
frameset: <frameset></frameset>
h1: <h1></h1>
h2: <h2></h2>
h3: <h3></h3>
h4: <h4></h4>
h5: <h5></h5>
h6: <h6></h6>
head: <head></head>
hr: <hr />
html: <html></html>
i: <i></i>
img:
input:
isindex: <isindex />
kbd: <kbd></kbd>
li: <li></li>
link: <link />
map: <map></map>
menu: <menu></menu>
meta: <meta />
nobr: <nobr></nobr>
noframes: <noframes></noframes>
object: <object></object>
ol: <ol></ol>
option: <option></option>
p: <p></p>
param: <param />
pre: <pre></pre>
samp: <samp></samp>
script: <script></script>
select: <select></select>
small: <small></small>
span:
strike: <strike></strike>
s: <s></s>
strong: <strong></strong>
style: <style></style>
sub: <sub></sub>
sup: <sup></sup>
table: <table></table>
td: <td></td>
textarea: <textarea></textarea>
th: <th></th>
title: <title></title>
tr: <tr></tr>
tt: <tt></tt>
u: <u></u>
ul: <ul></ul>
var: <var></var>
noticed this thing with span-element, wont get sanitized if given style-attribute 🤔
It's expected
https://github.com/OWASP/java-html-sanitizer/blob/032d11b8931442a026d12a3b44176652e631a8a1/src/main/java/org/owasp/html/HtmlPolicyBuilder.java#L165