pomsky icon indicating copy to clipboard operation
pomsky copied to clipboard

Detect lookbehind with a non-constant length or unbounded repetition

Open Aloso opened this issue 2 years ago • 0 comments

Java, Python, and PCRE all restrict what is allowed to appear in a lookbehind assertion. These restrictions are similar, but with some differences.

  • In Python, a lookbehind must have a fixed length and may only contain repetitions like {n}. \X is forbidden in a lookbehind.

  • In PCRE, a lookbehind may contain alternatives with different lengths, which can even be nested, as well as repetitions with an upper bound. As in Python, \X and unbounded repetitions are forbidden

  • Java has the most complicated rules. It allows arbitrary repetition in a lookbehind; however, every repetition must have a fixed length, and may not include alternations or ? even when they have the same length. ? / {0,1} is not treated as a repetition by Java.

    Java treats repeated groups ((a)+) differently than other kinds of repetitions (a+, [a]+). Infinite repetitions of groups must satisfy the following constraints, where $n$ is the number of code points matched by the group:

    • when $n$ is odd:

      • it must be preceded by an odd number $r$ of infinite repetitions, or
      • the part before the repeated group must match $c$ code points (excluding infinite repetitions), so that $c - r < n$
    • when $n$ is even:

      • it must be preceded by an odd number $r$ of infinite repetitions, and
      • the part before the repeated group must match $c$ code points (excluding infinite repetitions), so that $c - r \in [0; n)$

    These rules might not be comprehensive, but I haven't found an exception so far.

Describe the bug

Pomsky allows anything in a lookbehind.

Expected behavior

A compatibility error should be produced.

Aloso avatar Jan 13 '23 22:01 Aloso