xregexp icon indicating copy to clipboard operation
xregexp copied to clipboard

Add syntax for subpatterns as subroutines

Open slevithan opened this issue 8 years ago • 1 comments

This pseudo-AST structure described in #179 could also be the foundation of a useful advanced feature from PCRE, Perl, etc.: The ability to reference the entire contents of a named or numbered group (including nested parens) from later in the pattern, enabling support for subpattern reuse via (?&name) and (?n).

This would simply require generic syntax tokens for ) and any ( that isn't part of a self-contained token like (?#...) to mark subsequent tokens as children until the closing ) arrives. Then the generated pattern contents of each named group could be derived when needed.

Perhaps this would look like:

[
  {
    type: 'named-capture-start',
    name: 'name',
    output: '(',
    children: [
      {type: 'x-ignored', output: ''},
      {type: 'native-token', output: '.'},
    ],
  },
  {type: 'native-token', output: ')'},
]

Notes:

  • An error would need to be thrown if the group name referenced by (?&name) or group number with (?n) was not yet closed.
  • Make sure to handle things like (?<$1>.)(?<$2>(?&$1))(?&$2).
  • Some of the use cases are already handled by XRegExp.build and XRegExp.tag, but this would still be cleaner and or more robust in some cases, and the foundation created for it would make potential future XRegExp syntax addons more powerful.

slevithan avatar Apr 26 '17 19:04 slevithan

This would also enable (?<DEFINE>(?<name1>...)(?<name2>...)) blocks that make subpattern reuse via (?&name) and (?n) more robust.

slevithan avatar Jun 03 '21 15:06 slevithan