pomsky
pomsky copied to clipboard
Support mode modifiers
Section about mode modifiers in popular regex engines
Here are the best supported mode modifiers:
(?i): case insensitive(?m): multi-line mode (^and$match begin/end of line)(?s): single-line mode (allow.to match\n)(?x): free-spacing mode(?xx): free-spacing mode even in character classes(?U): make greedy quantifiers lazy and vice versa (a*? <--> a*)(?n): make all capturing groups without a name non-capturing ((a) --> (?:a))(?J): allow duplicate group names(?d): opt-out of Unicode line break support, so.,^and$only treat ASCII\nas line break(?X): makes invalid escape sequences an error
Some of these aren't needed because rulex has sane defaults:
sisn't needed because you can use[cp]orGraphemein rulexxandxxisn't needed because rulex is already free-spacingnisn't needed because groups are non-capturing by defaultXisn't needed because rulex doesn't have escape sequences, and invalid identifiers are an error
End of section
The mode modifiers we want to support are i (ignore case), m (multiline), J (duplicate group names), d (ASCII-only \n line breaks) and U (lazy quantifiers by default). The will be available with the following names:
- [ ]
ignore_case - [ ]
multiline - [ ]
reuse_groups - [ ]
ascii_line_breaks - [x]
lazy
They can be enabled and disabled with the enable mode;/disable mode; syntax. For example:
enable lazy;
'hello ' (disable ignore_case; 'world')*
Note that JavaScript doesn't allow mode modifiers within a regex, but the i and m modifiers are instead available as flags for the entire regex. So we can support enable ignore_case; 'hello world', which compiles to /hello world/i. This means that we need to support outputting flags.