antisamy icon indicating copy to clipboard operation
antisamy copied to clipboard

AntiSamy converting single quotes to double quotes for font-family which is causing issue while rendering

Open Gouravmoy opened this issue 1 year ago • 4 comments

Hi,

I am using AntiSamy in my project. I am having an issue where AntiSamy is automatically converting single quotes to double quotes. This creates issues in case of rendering font family. Below is an example:

String dirtyContent = "<p>Test <span style=\"font-family: 'comic sans ms', sans-serif; color: #ba372a;\"><sup>~Nalla</sup></span></p>";
CleanResults cr = as.scan(dirtyContent, policy);
System.out.println("Dirty : " + dirtyContent);
System.out.println("Clean : " + cr.getCleanHTML());

Below is the response:

Dirty : <p>Test <span style="font-family: 'comic sans ms', sans-serif; color: #ba372a;"><sup>~Nalla</sup></span></p>
Clean : <p>Test <span style="font-family: &quot;comic sans ms&quot; , sans-serif;color: rgb(186,55,42);"><sup>~Nalla</sup></span></p>

The problem happens because of when I unescape this and send it to the UI, it becomes something like this <p>Test <span style="font-family: "comic sans ms" , sans-serif;color: rgb(186,55,42);"><sup>~Nalla</sup></span></p>

Now the UI is not able to read the font family correctly because of the double quotes. It terminates after <span style="font-family: ". This is causing issue.

I had asked the same question in SO some time ago - Link

The solution seems to be to either keep the font family in single quotes or keep the font family in double and whole style in single quotes. Reference

Please let me know if you need any more information from my side. Also thanks in advance for looking into the issue.

Gouravmoy avatar Jul 14 '22 08:07 Gouravmoy

@spassarop - Any ideas on whether we can fix this?

davewichers avatar Jul 15 '22 00:07 davewichers

Hi everyone!

I debugged to detect where those quotes are added. When transforming CSS "lexical units" to CSS properties in the final string buffer that is the actual output, this particular case of a name between double quotes is taken as a SAC_IDENT or string identifier. This automatically adds " to the identifier when it has spaces (like comic sans ms):

case LexicalUnit.SAC_IDENT:
	// just a string/identifier
	String stringValue = lu.getStringValue();
	if(stringValue.indexOf(" ") != -1)
		stringValue = "\""+stringValue+"\"";
	return stringValue;

A possible fix is to switch " to '. I ran all tests and only one fails because it expects " but would work anyway. I also tried changing to this: <span style='font-family: "comic sans ms", sans-serif; color: #ba372a;'> inverting the quotes, and the output is forced to have " on the outside and ' on the inside. So it works anyway.

Thinking about differences between being an inline style (as a HTML attribute) or being inside a whole style tag, I only could thought about having additional quotes surrounding the styles.

I could just adapt the only failing test case and apply the proposed fix. If being more complex, I could internally pass the info that the tag is inline or not. Then, if it is inline, surround with '. If it is not inline, surround with ". I don't think it's really necessary to do all that but if anyone can come up with a scenario where it is relevant, let me know here.

spassarop avatar Jul 18 '22 20:07 spassarop

@nahsra do you foresee any kind of conflict about this change?

spassarop avatar Jul 22 '22 00:07 spassarop

@spassarop any idea when this change is going to be published?

dplovelybuddy avatar Jul 27 '22 04:07 dplovelybuddy

@Gouravmoy @dplovelybuddy - This fix was just included in the v1.7.1 release that just went out.

davewichers avatar Sep 06 '22 23:09 davewichers

@davewichers Thanks for letting us know :)

dplovelybuddy avatar Sep 07 '22 03:09 dplovelybuddy