slre
slre copied to clipboard
Mistmatch on (A|B) when A is a prefix of B
Hiho,
i'm not sure if this qualifies as a bug, a corner case, or a known limitation:
regex: (match|matchAll) input: match matchAll
when looping, that will match twice with these strings: ["match", "match"]
If the regex order is swapped, so that A is not a prefix of B (matchAll|match), then the result is as expected: ["match", "matchAll"].
Maybe a note in the docs suffices to cover this (maybe there is already one i overlooked).
From pcre specification (emphasis mine):
Vertical bar characters are used to separate alternative patterns. For example, the pattern
gilbert|sullivan
matches either "gilbert" or "sullivan". Any number of alternatives may appear, and an empty alternative is permitted (matching the empty string). The matching process tries each alternative in turn, from left to right, and the first one that succeeds is used. If the alternatives are within a subpattern (defined below), "succeeds" means matching the rest of the main pattern as well as the alternative in the subpattern.
I checked RegExps in Firefox, Chrome, CPython, and Ack, and they work the same way. Grep did find the longest, though. But overall I think this is the expected behavior.
A little variation on this:
IMHO the pattern "^(y|yes|YES|Yes|true|TRUE|True)$" should match string "yes" but it does not. If the pattern is rearranged "^(yes|y|YES|Yes|true|TRUE|True)$" then it matches. I quickly checked with Regex Buddy and none of the supported regex flavours show this behaviour.
I am happy to accept a PR with the associated unit test :)
Not a big issue in my project, since I have control over the regexes. Just thought I let you know. If a project uses SLRE for user supplied regexes the result might be unexpected.