xregexp icon indicating copy to clipboard operation
xregexp copied to clipboard

Align with native Unicode property escapes

Open mathiasbynens opened this issue 7 years ago • 2 comments
trafficstars

Other than the set of supported properties, there are some key differences between XRegExp’s handling of \p{…} and the way they work in native JS.

  • Native \p{…} doesn’t implement loose matching; only strict, case-sensitive matches for canonical property names and values (or their aliases) are accepted
  • In native \p{…}, Blocks are not supported
  • Native \p{…} doesn’t support for the In prefix or any other prefix (although since XRegExp only does this for Blocks, dropping Block support already resolves this)
  • Native \p{…} supports Script_Extensions which is generally more useful than Script

Technically these are all breaking changes, but IMHO we should consider aligning with native property escapes.

mathiasbynens avatar Feb 21 '18 00:02 mathiasbynens

Supporting Script_Extensions is the biggest change here. It might make sense to merely ensure that use of Script_Extensions (which is easy to identify since ES2018 requires a prefix when using them) is passed through correctly and works when in an ES2018 environment that supports it natively. That would avoid the need for adding the large associated data that is significantly redundant with the existing Script data. But then, removing Blocks would "free up" space for this. I'm definitely open to adding Script_Extensions as a new addon.

The other changes (dropping support for loose name matching and Blocks) should be straightforward. Happy to adopt these changes and publish a new major version. Will go ahead with them if others don't get to them before me.

Aside: While we're considering breaking changes for the Unicode addons, perhaps we should move unicode-base.js into xregexp.js, to make it even easier to import the individual sets of data for scripts, general categories, and binary properties based on what you need.

slevithan avatar Feb 21 '18 01:02 slevithan

Removed support for Unicode blocks in commit 4860122362c9822f35ab7f2deea7973a5815fcac.

slevithan avatar Jan 18 '21 05:01 slevithan