liquid icon indicating copy to clipboard operation
liquid copied to clipboard

Add support in regular expressions for UTF-8 whitespace detection

Open zachmccormick opened this issue 1 year ago • 2 comments

We ran across a nasty bug at Braze where a customer was supplying the UTF-8 non-breaking space character in a Liquid template they were providing to us, and it took a very long time to debug why it was not parsing correctly. It turns out that the user-supplied Liquid string had some UTF-8 non-breaking spaces in it, which the current regular expressions do not count as whitespace (\s only includes ASCII whitespace, while [[:space:]] includes ASCII and UTF-8 whitespace characters).

I replaced \s everywhere, but I added a single test case that red-greens against the existing code. Getting full coverage of every possibility seemed excessive, although I'm open to implementing more thorough tests if it's needed before merging.

Co-authored-by: Chris Watkins [email protected]

zachmccormick avatar Feb 15 '24 20:02 zachmccormick

I have signed the CLA!

zachmccormick avatar Feb 15 '24 20:02 zachmccormick

It may also be smart to replace \w with [[:word:]] to work properly with non-ASCII word characters as well, however I would imagine those are easier to spot visually and probably don't get accidentally used.

zachmccormick avatar Feb 15 '24 20:02 zachmccormick