polyfill icon indicating copy to clipboard operation
polyfill copied to clipboard

[PHP 8.4][Intl] Add `grapheme_str_split`

Open Ayesh opened this issue 1 year ago • 8 comments

Adds a polyfill for the grapheme_str_split function added in PHP 8.4.

Requires PHP 7.3, because the polyfill is based on \X Regex, and it only works properly on PCRE2, which only comes with PHP 7.3+.

Further, there are some cases that the polyfill cannot split complex characters (such as two consecutive country flag Emojis). This is now fixed in PCRE2Project/pcre2#410. However, this change will likely only make it to PHP 8.4.

References:

Ayesh avatar Jun 05 '24 15:06 Ayesh

(working on the Intl changes, I'll mark the PR ready then)

Ayesh avatar Jun 05 '24 15:06 Ayesh

Thank you. I think, we should add this polyfill to the intl-grapheme polyfill as well.

derrabus avatar Jun 05 '24 20:06 derrabus

Thank you @derrabus - I added polyfill and tests for grapheme_str_split to the Intl polyfill too.

Ayesh avatar Jun 08 '24 10:06 Ayesh

Friendly ping @Ayesh :)

nicolas-grekas avatar Sep 09 '24 07:09 nicolas-grekas

Thank you @nicolas-grekas - really helpful comments, I addressed them and force-pushed. \X regex polyfill for PCRE1 is very cool, it worked beautifully, 10,000 IQ regex 🤯 :)

Ayesh avatar Sep 09 '24 14:09 Ayesh

One last push, thank you for being patient with this 💜

Ayesh avatar Sep 09 '24 14:09 Ayesh

So we have a test failure on PHP 7.2 :) Maybe we should remove the corresponding test case? The fallback regexp doesn't account for ZWJ emojis IIRC

nicolas-grekas avatar Sep 09 '24 14:09 nicolas-grekas

Perfect, fixed. So far, we exclude this ZW joiner case on PHP 7.2, and a known buggy PCRE2 \X capture on PCRE2 < 10.44 regardless of the PHP version.

Ayesh avatar Sep 09 '24 15:09 Ayesh

Thank you @Ayesh.

nicolas-grekas avatar Jun 24 '25 07:06 nicolas-grekas