xregexp
xregexp copied to clipboard
Align with native Unicode property escapes
Other than the set of supported properties, there are some key differences between XRegExp’s handling of \p{…} and the way they work in native JS.
- Native
\p{…}doesn’t implement loose matching; only strict, case-sensitive matches for canonical property names and values (or their aliases) are accepted - In native
\p{…},Blocks are not supported - Native
\p{…}doesn’t support for theInprefix or any other prefix (although since XRegExp only does this forBlocks, droppingBlocksupport already resolves this) - Native
\p{…}supportsScript_Extensionswhich is generally more useful thanScript
Technically these are all breaking changes, but IMHO we should consider aligning with native property escapes.
Supporting Script_Extensions is the biggest change here. It might make sense to merely ensure that use of Script_Extensions (which is easy to identify since ES2018 requires a prefix when using them) is passed through correctly and works when in an ES2018 environment that supports it natively. That would avoid the need for adding the large associated data that is significantly redundant with the existing Script data. But then, removing Blocks would "free up" space for this. I'm definitely open to adding Script_Extensions as a new addon.
The other changes (dropping support for loose name matching and Blocks) should be straightforward. Happy to adopt these changes and publish a new major version. Will go ahead with them if others don't get to them before me.
Aside: While we're considering breaking changes for the Unicode addons, perhaps we should move unicode-base.js into xregexp.js, to make it even easier to import the individual sets of data for scripts, general categories, and binary properties based on what you need.
Removed support for Unicode blocks in commit 4860122362c9822f35ab7f2deea7973a5815fcac.