xregexp
xregexp copied to clipboard
Reduce cases where unnecessary (?:) separator is added to pattern
Update the XRegExp constructor (more specifically the runTokens function) to pass preceding tokens to token handler functions. Alternatively they could be made available on the token handler context (like the existing this.captureNames and this.hasNamedCapture).
In addition to this being a generally useful feature that could enable more powerful syntax addons, this would enable much better handling of where (?:) separators are inserted when stripping whitespace (with flag x) and comments out of a pattern. Currently the getContextualTokenSeparator function is used for this, but it's not very robust. E.g. it avoids adding (?:) if the preceding character is (, but it doesn't deal with (?<name> and other cases where a separator isn't needed.
It would make things even easier if preceding tokens were annotated with a type string (provided as a new property on the options argument of XRegExp.addToken). E.g. from XRegExp('(?<name>\n.)', 'x') you could get:
[
{type: 'named-capture-start', output: '('},
{type: 'x-ignored', output: ''},
{type: 'native-token', output: '.'},
{type: 'native-token', output: ')'}
]
Then the getContextualTokenSeparator function can easily check whether the preceding token is something that requires the (?:) separator be added.
This above idea of making preceding tokens available to token handler functions would require special handling for the reparse option, since only the final reparsed version of token should be added to the list of prior tokens.
After this change:
- Check whether the lines in
build.jsfor handling(?:)are still needed. - Consider removing the cleanup of double
(?:)(?:). - Upgrade to support at least
(?<name>as an additional case when(?:)doesn't need to be added.
See related discussion in https://github.com/slevithan/xregexp/pull/164#issuecomment-294345042.
New diffs 076f9501965d9ddc4f1cf7b7626c77993b396a01 and d78a26216691c975acf5424f371db9763f307c7a cleaned up more cases where (?:) isn't needed. Specifically, the following are now covered:
- Before a group (left of
(). - After a group (right of
)). - After the opening of a lookbehind (right of
(?<=and(?<!).
Nice!