regexr icon indicating copy to clipboard operation
regexr copied to clipboard

PCRE returns incorrect ranges when matching against strings with emojis

Open fwrs opened this issue 3 years ago • 1 comments

Steps to reproduce the problem

  1. Enter the following regex: /test/g
  2. Enter the following test string: test 🥶 test

Expected result

You see each instance of the word test being highlighted, twice in total

Actual result

You see two different captured strings test, and <space>tes (the second capture is shifted by 1 letter to the left)

Screenshot

fwrs avatar Feb 08 '22 14:02 fwrs

The length of 🥶 in JavaScript is 2 not 1. 🥶 On the other hand, 🥶 seems to be treated as a single character in preg_match and preg_match_all. 🥶

I can't think of an easy way to handle the difference. 🥶

Attacktive avatar May 04 '22 05:05 Attacktive