pomsky icon indicating copy to clipboard operation
pomsky copied to clipboard

Support mode modifiers

Open Aloso opened this issue 3 years ago • 0 comments

Section about mode modifiers in popular regex engines

Here are the best supported mode modifiers:

  • (?i): case insensitive
  • (?m): multi-line mode (^ and $ match begin/end of line)
  • (?s): single-line mode (allow . to match \n)
  • (?x): free-spacing mode
  • (?xx): free-spacing mode even in character classes
  • (?U): make greedy quantifiers lazy and vice versa (a*? <--> a*)
  • (?n): make all capturing groups without a name non-capturing ((a) --> (?:a))
  • (?J): allow duplicate group names
  • (?d): opt-out of Unicode line break support, so ., ^ and $ only treat ASCII \n as line break
  • (?X): makes invalid escape sequences an error

Some of these aren't needed because rulex has sane defaults:

  • s isn't needed because you can use [cp] or Grapheme in rulex
  • x and xx isn't needed because rulex is already free-spacing
  • n isn't needed because groups are non-capturing by default
  • X isn't needed because rulex doesn't have escape sequences, and invalid identifiers are an error

End of section


The mode modifiers we want to support are i (ignore case), m (multiline), J (duplicate group names), d (ASCII-only \n line breaks) and U (lazy quantifiers by default). The will be available with the following names:

  • [ ] ignore_case
  • [ ] multiline
  • [ ] reuse_groups
  • [ ] ascii_line_breaks
  • [x] lazy

They can be enabled and disabled with the enable mode;/disable mode; syntax. For example:

enable lazy;
'hello ' (disable ignore_case; 'world')*

Note that JavaScript doesn't allow mode modifiers within a regex, but the i and m modifiers are instead available as flags for the entire regex. So we can support enable ignore_case; 'hello world', which compiles to /hello world/i. This means that we need to support outputting flags.

Aloso avatar Feb 26 '22 16:02 Aloso