java-html-sanitizer icon indicating copy to clipboard operation
java-html-sanitizer copied to clipboard

Discrepancies of results when sanitizing allowed tags

Open Pamplemousse opened this issue 3 years ago • 2 comments

While trying to use .allowElements(), I noticed some surprising results: different tags yield different kind of output.

For example:

  1. A policy with HtmlPolicyBuilder().allowElements("a").toFactory().sanitize("<a>") returns nothing;
  2. A policy with HtmlPolicyBuilder().allowElements("div").toFactory().sanitize("<div>") returns <div></div>.

I expected 1. not to be empty, but also the two results to be consistent...

Am I missing something, am I doing anything wrong?


With the following (trimmed) code, I tested many elements to expose their behaviour.

import java.util.Arrays;
import javax.swing.text.html.HTML;
import org.owasp.html.PolicyFactory;
import org.owasp.html.HtmlPolicyBuilder;

// [...]

String[] HTML_ELEMENTS = Arrays
    .stream(HTML.getAllTags())
    .map(Object::toString)
    .toArray(String[]::new);

// [...]

for (int i=0; i<HTML_ELEMENTS.length; i++) {
    String element = HTML_ELEMENTS[i];
    String sanitized = HtmlPolicyBuilder()
        .allowElements(element)
        .toFactory()
        .sanitize("<" + element + ">");
    System.out.println(element + ": " + sanitized);
}

And I got the following list, where several other tags behave like a:

    a:
    address: <address></address>
    applet: <applet></applet>
    area: <area />
    b: <b></b>
    base: <base />
    basefont: <basefont />
    big: <big></big>
    blockquote: <blockquote></blockquote>
    body: <body></body>
    br: <br />
    caption: <caption></caption>
    center: <center></center>
    cite: <cite></cite>
    code: <code></code>
    dd: <dd></dd>
    dfn: <dfn></dfn>
    dir: <dir></dir>
    div: <div></div>
    dl: <dl></dl>
    dt: <dt></dt>
    em: <em></em>
    font:
    form: <form></form>
    frame: <frame></frame>
    frameset: <frameset></frameset>
    h1: <h1></h1>
    h2: <h2></h2>
    h3: <h3></h3>
    h4: <h4></h4>
    h5: <h5></h5>
    h6: <h6></h6>
    head: <head></head>
    hr: <hr />
    html: <html></html>
    i: <i></i>
    img:
    input:
    isindex: <isindex />
    kbd: <kbd></kbd>
    li: <li></li>
    link: <link />
    map: <map></map>
    menu: <menu></menu>
    meta: <meta />
    nobr: <nobr></nobr>
    noframes: <noframes></noframes>
    object: <object></object>
    ol: <ol></ol>
    option: <option></option>
    p: <p></p>
    param: <param />
    pre: <pre></pre>
    samp: <samp></samp>
    script: <script></script>
    select: <select></select>
    small: <small></small>
    span:
    strike: <strike></strike>
    s: <s></s>
    strong: <strong></strong>
    style: <style></style>
    sub: <sub></sub>
    sup: <sup></sup>
    table: <table></table>
    td: <td></td>
    textarea: <textarea></textarea>
    th: <th></th>
    title: <title></title>
    tr: <tr></tr>
    tt: <tt></tt>
    u: <u></u>
    ul: <ul></ul>
    var: <var></var>

Pamplemousse avatar Apr 12 '22 15:04 Pamplemousse

noticed this thing with span-element, wont get sanitized if given style-attribute 🤔

anttinym avatar Feb 03 '23 13:02 anttinym

It's expected

https://github.com/OWASP/java-html-sanitizer/blob/032d11b8931442a026d12a3b44176652e631a8a1/src/main/java/org/owasp/html/HtmlPolicyBuilder.java#L165

subbudvk avatar Feb 29 '24 07:02 subbudvk