Support extended POSIX regexes
There are many devices that only support ERE, We need it.
@wy16W2pIilK1xgqN could you explain? What devices are they?
A lot , routers and firewalls, For example, all devices of MikroTik
The problem is that ERE doesn't support non-capturing groups, like
("hello"? | "world"+) "!!"
which compiles to
(?:(?:hello)?|(?:world)+)!!
For ERE, this would have to compile to
((hello)?|(world)+)!!
But this is not equivalent, because it changes the capturing group indexes. So we either need an option to never emit non-capturing groups when compiling to ERE, or we need to make the above code illegal, requiring capturing groups like this:
:(:("hello")? | :("world")+) "!!"
Although the outer capturing group could be avoided by "inlining" the exclamation mark:
(:("hello")? | :("world")+) "!!"
(hello)?!!|(world)+!!
But that could lead to exponential size increase of the generated expression, so probably not a good idea.
The other problem is that ERE does not allow escaping characters within a character class, so characters need to be rearranged:
['^' 'a'-'z' '\' '-' ']']
will have to be compiled to
[]^a-z\-]
Rules:
- The literal
^can't appear at the start - The literal
]can only appear at the start - The literal
-can only appear at the start or end
Another problem: Codepoint/C doesn't work (it compiles to [\s\S], which is not supported in ERE), so what are the alternatives?
- Allow the dot instead (matches anything except line breaks by default; line breaks are included in multiline mode)
- Compile
Cto., but that would change the behavior of the pomsky expression depending on the flavor; not good - Compile
Cto(.|\s), but that can lead to catastrophic backtracking; also,\sis supported by GNU ERE but not POSIX ERE; not good
The dot is now supported as of Pomsky 0.8. Rewriting the code for compiling character classes is in progress, with the goal of eventually supporting ERE. The only open question right now is how to handle non-capturing groups. Any input for this would be appreciated!
Possibilities are:
-
disallow non-capturing groups when targeting ERE, requiring users to write
:()instead -
add an option to silently convert non-capturing groups to capturing groups when targeting ERE; this could be made configurable, e.g. with
-Xcapture=always
Both have disadvantages (1. makes pomsky expressions less portable, but 2. makes behavior of pomsky expressions less predictable).