Detect lookbehind with a non-constant length or unbounded repetition
Java, Python, and PCRE all restrict what is allowed to appear in a lookbehind assertion. These restrictions are similar, but with some differences.
-
In Python, a lookbehind must have a fixed length and may only contain repetitions like
{n}.\Xis forbidden in a lookbehind. -
In PCRE, a lookbehind may contain alternatives with different lengths, which can even be nested, as well as repetitions with an upper bound. As in Python,
\Xand unbounded repetitions are forbidden -
Java has the most complicated rules. It allows arbitrary repetition in a lookbehind; however, every repetition must have a fixed length, and may not include alternations or
?even when they have the same length.?/{0,1}is not treated as a repetition by Java.Java treats repeated groups (
(a)+) differently than other kinds of repetitions (a+,[a]+). Infinite repetitions of groups must satisfy the following constraints, where $n$ is the number of code points matched by the group:-
when $n$ is odd:
- it must be preceded by an odd number $r$ of infinite repetitions, or
- the part before the repeated group must match $c$ code points (excluding infinite repetitions), so that $c - r < n$
-
when $n$ is even:
- it must be preceded by an odd number $r$ of infinite repetitions, and
- the part before the repeated group must match $c$ code points (excluding infinite repetitions), so that $c - r \in [0; n)$
These rules might not be comprehensive, but I haven't found an exception so far.
-
Describe the bug
Pomsky allows anything in a lookbehind.
Expected behavior
A compatibility error should be produced.