liquid
liquid copied to clipboard
Add support in regular expressions for UTF-8 whitespace detection
We ran across a nasty bug at Braze where a customer was supplying the UTF-8 non-breaking space character in a Liquid template they were providing to us, and it took a very long time to debug why it was not parsing correctly. It turns out that the user-supplied Liquid string had some UTF-8 non-breaking spaces in it, which the current regular expressions do not count as whitespace (\s
only includes ASCII whitespace, while [[:space:]]
includes ASCII and UTF-8 whitespace characters).
I replaced \s
everywhere, but I added a single test case that red-greens against the existing code. Getting full coverage of every possibility seemed excessive, although I'm open to implementing more thorough tests if it's needed before merging.
Co-authored-by: Chris Watkins [email protected]
I have signed the CLA!
It may also be smart to replace \w
with [[:word:]]
to work properly with non-ASCII word characters as well, however I would imagine those are easier to spot visually and probably don't get accidentally used.