java-html-sanitizer icon indicating copy to clipboard operation
java-html-sanitizer copied to clipboard

is <plaintext> element required ??

Open Sam2243 opened this issue 4 years ago • 1 comments

Hi,

So I have a text as "<img src=x onerror=window.open('http://evil.test.com/');/>"

and the following is the policy

	    PolicyFactory policy = new HtmlPolicyBuilder()
	        .allowElements("a")
	        .allowUrlProtocols("https", "http")
	        .allowAttributes("href").onElements("a")
	        .toFactory();

When I sanitize it, I dont get any output. Although when I add

to the text, it does get sanitized. Like the following,</plaintext></p> <p><code>String text = "&lt;plaintext&gt;&lt;img src=x onerror=window.open('http://evil.test.com/');/&gt;";</code></p> <p>Why do I need to put plaintext element here? Any alternatives to it?</p> <p>Appreciate your help.</p> <p>Thanks</p> </body></html> </p> <div class="body-meta"> <div class="owner-info float-right"> <img class="owner-avatar" src="https://avatars.githubusercontent.com/u/3024672?v=4" alt="Sam2243 avatar" width="48" height="48" title="Published by a pub.dev verified publisher"/> <span class="w3-opacity w3-small"> Dec 20 &#039;20 22:12 </span> <a class="owner-name" href="http://gitmemories.com/Sam2243"> Sam2243 </a> </div> <div class="clear-both"></div> </div> <div class="clear-both"></div> </div> <div class="issue-body"> <p> <html><body><p>In your code snippet, you are not whitelisting the <code>img</code> element or the atributes <code>src</code> and <code>onerror</code>, so yes, it is expected that all markup gets removed from your input. That's exactly the goal of sanitizing.</p> <blockquote> <p>Although when I add <plaintext> to the text, it does get sanitized.</plaintext></p> </blockquote> <p>What do you mean when you say "get sanitized" here? What is the output? I would expect the input to be returned essentially unmodified.</p> <p><a href="https://developer.mozilla.org/en-US/docs/Web/HTML/Element/plaintext" rel="nofollow" target="_blank" >Adding <code>&lt;plaintext&gt;</code> will mark everything after that start tag (including things like <code>&lt;/body&gt;</code>) as non-markup text.</a> The sanitizer will ignore everything after that start tag, and if you tried to place such a string into a document, you would probably break the document. I doubt <code>&lt;plaintext&gt;</code> is something you want.</p> </body></html> </p> <div class="body-meta"> <div class="owner-info float-right"> <img class="owner-avatar" src="https://avatars.githubusercontent.com/u/29395069?v=4" alt="li-a avatar" width="48" height="48" title="Published by a pub.dev verified publisher"/> <span class="w3-opacity w3-small"> Jan 13 &#039;21 21:01 </span> <a class="owner-name" href="http://gitmemories.com/li-a"> li-a </a> </div> </div> <div class="clear-both"></div> </div> </section> </div> </div> </div> <aside class="detail-info-box"> <h3 class="title pkg-infobox-metadata">Labels</h3> <h3 class="title">Owner</h3> <div class="owner-info flex-items-center d-flex"> <img class="owner-avatar" src="https://avatars.githubusercontent.com/u/3024672?v=4" alt="Sam2243 avatar" width="48" height="48" title="Published by a pub.dev verified publisher"/> <a class="owner-name" href="http://gitmemories.com/Sam2243"> Sam2243 </a> </div> <hr class="color1"> <h3>Other Repo Issues</h3> <div class="packages"> </div> <div style="clear: both"></div> </aside> </div> </div> </main> <footer class="site-footer"> <span>© 2022 Git Memory </span> <a class="link sep" href="http://gitmemories.com/privacy-policy">Policy</a> <a class="link sep" href="http://gitmemories.com/terms">Terms </a> <a class="link sep" href="http://gitmemories.com/contact">Contact </a> <a class="link sep" href="https://exchangetuts.com">Exchangetuts </a> <a class="link sep" href="https://onltools.com/">Onltools </a> </footer> <script src="http://gitmemories.com/js/app.js?id=a1337f444c840fae34ad"></script> </body> </html>