regexr
regexr copied to clipboard
PCRE returns incorrect ranges when matching against strings with emojis
Steps to reproduce the problem
- Enter the following regex:
/test/g - Enter the following test string:
test 🥶 test
Expected result
You see each instance of the word test being highlighted, twice in total
Actual result
You see two different captured strings test, and <space>tes (the second capture is shifted by 1 letter to the left)
Screenshot

The length of 🥶 in JavaScript is 2 not 1. 🥶
On the other hand, 🥶 seems to be treated as a single character in preg_match and preg_match_all. 🥶
I can't think of an easy way to handle the difference. 🥶