antisamy icon indicating copy to clipboard operation
antisamy copied to clipboard

Antisamy stripping < and all the charcters after it.

Open Sruthi0989 opened this issue 11 months ago • 5 comments

We have a requirement to use < symbol in text that we pass to antisamy. But antisamy scan operation is removing the the text added after < symbol. I have noticed that the antisamy scan is is not encoding symbol < as <, like other symbols, which might be the reason for the issue.

Is there a way we can preserve the text after < and remove the script tag which might cause the issue.

I have tried adding the below content to antisamy.xml but still i see the text getting truncated after <: image Please refer the attached antisamy file used and the java code. Antisamy.zip

Sruthi0989 avatar Dec 26 '24 05:12 Sruthi0989

@spassarop - Can you look into this? This was posted ~2 months ago.

davewichers avatar Feb 24 '25 15:02 davewichers

It seems the parsers are trimming <test from the source input csc:<test. When debugging I checked AntiSamy does not even get the chance to determine what to do with <test because it does not even exist on the parsed representation. The result is only an HTML text node with csc:.

I do not have a solution. The only alternative that comes to my mind is that a setting on the parser exists which may prevent that or reconsider if that is the behavior it should have. I would like @rbri 's opinion on this matter.

spassarop avatar Mar 09 '25 16:03 spassarop

@spassarop sorry for being that late on this, there was a lot of other things to do.... will have a look

rbri avatar Apr 04 '25 14:04 rbri

Ok, did some tests and i think neko is correct here.

From nekos point of view '<test' looks like a start tag and therefore it is reported as such. You can place a breakpoint in

org.owasp.validator.html.scan.MagicSAXFilter.startElement(QName, XMLAttributes, Augmentations)

and this breakpoint is reached with 'test' as QName. But test is an unknown tag and therefore antisamy reports an error.

You can also check

  • https://github.com/HtmlUnit/htmlunit/commit/3dff1b85e663903c8d4353b1b290fb13ed6ab416
  • https://github.com/HtmlUnit/htmlunit-neko/commit/72493a03ad5162c3cf4e725018dfe51641cca791 for my test cases.

rbri avatar Apr 04 '25 16:04 rbri

That's weird. I tested like this:

String s = "csc:<test";
Tag tag = new Tag("test", Collections.<String, Attribute>emptyMap(), Policy.ACTION_VALIDATE);
TestPolicy revised = policy.addTagRule(tag);
String domValue = as.scan(s, revised, AntiSamy.DOM).getCleanHTML();
String saxValue = as.scan(s, revised, AntiSamy.SAX).getCleanHTML();
System.out.println(domValue);
System.out.println(saxValue);

Even added test by hand on the default policy. With breakpoints I never get stopped inside startElement. The thing is I don't get to check for the existence of the tag against the policy because I only receive a text node with csc:. We may have different configs? I used your test files as reference but the behavior is the same.

spassarop avatar Apr 26 '25 18:04 spassarop