Antisamy stripping < and all the charcters after it.
We have a requirement to use < symbol in text that we pass to antisamy. But antisamy scan operation is removing the the text added after < symbol. I have noticed that the antisamy scan is is not encoding symbol < as <, like other symbols, which might be the reason for the issue.
Is there a way we can preserve the text after < and remove the script tag which might cause the issue.
I have tried adding the below content to antisamy.xml but still i see the text getting truncated after <:
Please refer the attached antisamy file used and the java code.
Antisamy.zip
@spassarop - Can you look into this? This was posted ~2 months ago.
It seems the parsers are trimming <test from the source input csc:<test. When debugging I checked AntiSamy does not even get the chance to determine what to do with <test because it does not even exist on the parsed representation. The result is only an HTML text node with csc:.
I do not have a solution. The only alternative that comes to my mind is that a setting on the parser exists which may prevent that or reconsider if that is the behavior it should have. I would like @rbri 's opinion on this matter.
@spassarop sorry for being that late on this, there was a lot of other things to do.... will have a look
Ok, did some tests and i think neko is correct here.
From nekos point of view '<test' looks like a start tag and therefore it is reported as such. You can place a breakpoint in
org.owasp.validator.html.scan.MagicSAXFilter.startElement(QName, XMLAttributes, Augmentations)
and this breakpoint is reached with 'test' as QName. But test is an unknown tag and therefore antisamy reports an error.
You can also check
- https://github.com/HtmlUnit/htmlunit/commit/3dff1b85e663903c8d4353b1b290fb13ed6ab416
- https://github.com/HtmlUnit/htmlunit-neko/commit/72493a03ad5162c3cf4e725018dfe51641cca791 for my test cases.
That's weird. I tested like this:
String s = "csc:<test";
Tag tag = new Tag("test", Collections.<String, Attribute>emptyMap(), Policy.ACTION_VALIDATE);
TestPolicy revised = policy.addTagRule(tag);
String domValue = as.scan(s, revised, AntiSamy.DOM).getCleanHTML();
String saxValue = as.scan(s, revised, AntiSamy.SAX).getCleanHTML();
System.out.println(domValue);
System.out.println(saxValue);
Even added test by hand on the default policy. With breakpoints I never get stopped inside startElement. The thing is I don't get to check for the existence of the tag against the policy because I only receive a text node with csc:. We may have different configs? I used your test files as reference but the behavior is the same.