java-html-sanitizer icon indicating copy to clipboard operation
java-html-sanitizer copied to clipboard

Sanitize is removing elements / not following browser behavior

Open rafaeljcg opened this issue 5 years ago • 1 comments

Sanitize is removing elements of invalid html, not following browser behaviour.

Using the following policy:

new HtmlPolicyBuilder()
          .allowElements("p")
          .allowAttributes("class")
          .onElements("p")
          .toFactory()

and input:

<p>foo</p> <p class="test" "="">bar</p> <p>baz</p>

I get this output:

<p>foo</p> <p class="test"></p>

Expected output:

<p>foo</p> <p class="test">bar</p> <p>baz</p>

Is this behavior expected?

rafaeljcg avatar Jan 10 '20 17:01 rafaeljcg

in HtmlSanitizer, "=" tagBodyToken.type is QSTRING. and it doesn't have problem itself. but because of next " , tagBodyToken end value is your input string last length. so process was the end.

when your input string is

<p>foo</p> <p class="test" "=">bar</p> <p>baz</p>

the result string is

<p>foo</p> <p class="test">bar</p> <p>baz</p>

yangbongsoo avatar May 19 '20 06:05 yangbongsoo