java-html-sanitizer icon indicating copy to clipboard operation
java-html-sanitizer copied to clipboard

Issue in replacemnt in url in achor tag href attr with html sanitization

Open jrjena136 opened this issue 4 years ago • 3 comments

I have one url like this below in html anchor tag. <a href="https://xxx.com/qwert/ab_cdefmnp.php?pf=ppp_qqq&num_yyy=ZZZZZ">ZZZZ</a> when I apply html sanitization why this value &num is replaced by # and the output html is like this below <a href="https://xxx.com/qwert/ab_cdefmnp.php?pf=ppp_qqq#_yyy=ZZZZZ">ZZZZ</a> which is became invalid. I have used owasp in my project. How to avoid this change.

Any thought or suggestion would be appreciated.

jrjena136 avatar Jan 14 '21 13:01 jrjena136

hello I am sanitizer user. could you share your sanitizer policy?

yangbongsoo avatar Jan 17 '21 12:01 yangbongsoo

We have used owasp with antisamy policy as well. we have the antisamy.xml

jrjena136 avatar Jan 18 '21 06:01 jrjena136

This code:

String out = Sanitizers.LINKS.sanitize(
    "<a href=\"https://xxx.com/qwert/ab_cdefmnp.php?pf=ppp_qqq&num_yyy=ZZZZZ\">ZZZZ</a>");

Produces:

<a href="https://xxx.com/qwert/ab_cdefmnp.php?pf&#61;ppp_qqq&amp;num_yyy&#61;ZZZZZ" rel="nofollow">ZZZZ</a>

Note that the "&num" has become "&num", and this is correct. On the other hand if the input had contains "...qqq#_yyy", then the additional ';' would have led to the entity being recognised as a '#', and that would also have been correct given the input.

Please provide a minimal reproducible example of the code you believe is producing incorrect output.

simon-greatrix avatar Feb 06 '21 15:02 simon-greatrix