java-html-sanitizer
java-html-sanitizer copied to clipboard
Sanitize is removing elements / not following browser behavior
Sanitize is removing elements of invalid html, not following browser behaviour.
Using the following policy:
new HtmlPolicyBuilder()
.allowElements("p")
.allowAttributes("class")
.onElements("p")
.toFactory()
and input:
<p>foo</p> <p class="test" "="">bar</p> <p>baz</p>
I get this output:
<p>foo</p> <p class="test"></p>
Expected output:
<p>foo</p> <p class="test">bar</p> <p>baz</p>
Is this behavior expected?
in HtmlSanitizer, "="
tagBodyToken.type is QSTRING. and it doesn't have problem itself.
but because of next "
, tagBodyToken end value is your input string last length.
so process was the end.
when your input string is
<p>foo</p> <p class="test" "=">bar</p> <p>baz</p>
the result string is
<p>foo</p> <p class="test">bar</p> <p>baz</p>