T-Regx
T-Regx copied to clipboard
Provide interface for iterative matching, to avoid catastrophic backtracking
Currently, users of PHP regexp have only two choices:
preg_match_all(), which performs all available matches right awaypreg_match(), which performs only a single match
The problem with preg_match_all(), is that sometimes users need only the first 2 or 3 matches, where 4th match would cause catastrophic backtracking. Currently, it may cause users to use substr() or $offset to use preg_match() to find next calls, because preg_match_all() doesn't suffice.
This could be worked with, because preg_replace and preg_replace_callback() for example provide $limit parametr, which can control very precisely how many calls will be done, but with matching, it's not possible.
This may not be possible.
It is only possible, if the assumption is correct, that the next match should be made at offset which is the sum of the previous offset and the length of the first match, excluding capturing groups.
Is this assumption always true?
function nextOffset(string $match, int $offset): int {
return $offset + \strLen($match);
}
Even if you include \K resets, that will still hold, and look-arounds with groups shouldn't change anything, but is there anything else missing here?
PS: Anchoring with A also works fine.
Perhaps use limits of preg_replace()?