perl5
perl5 copied to clipboard
"Quantifier unexpected on zero-length expression in regex" when using quantifier as a switch
Description
The possible way to control regex constructs, e.g. switch them ON or OFF, is to use quantifier {n}
, with n = 0 or 1;
When controlling zero-length constructs, only n = 0 and 1 make sense. n = 0 should switch OFF particular construct.
The warning "Quantifier unexpected on zero-length expression in regex" in some cases is possibly redundant, although it may be useful in most of other cases.
Behavior in rare cases was unexpected (at least until perl-5.32).
Steps to Reproduce
Non-necessary warning "Quantifier unexpected...". Warning is not generated (5.38.2, 5.36.0, 5.32.0):
perl -wle '( $n1, $n2 ) = ( split "" ), "aba" =~ m/ab(*FAIL){$n1}|a(*FAIL){$n2}|ba/ and print "[$n1][$n2][$&]" for glob "{0,1}" x 2'
[0][0][ab]
[0][1][ab]
[1][0][a]
[1][1][ba]
perl-5.20.1 - same output, but with 4x4 similar warnings:
Quantifier unexpected on zero-length expression in regex; marked by <-- HERE in m/ab(*FAIL){0}|a(*FAIL){0}|ba <-- HERE / at -e line 1.
But somehow this warning appears in (5.38.2, 5.36.0, 5.32.0), when I leave only 1 branch out of 3:
perl -wle '$n1 = $_, "aba" =~ m/ab(*FAIL){$n1}/ xor print "[$n1][$&]" for 0, 1'
Useless use of logical xor in void context at -e line 1.
Quantifier unexpected on zero-length expression in regex m/ab(*FAIL){0}/ at -e line 1.
[0][ab]
Quantifier unexpected on zero-length expression in regex m/ab(*FAIL){1}/ at -e line 1.
Use of uninitialized value $& in concatenation (.) or string at -e line 1.
[1][]
I.e. no warning when I add a branch, e.g. |x
:
perl -wle '$n1 = $_, "aba" =~ m/ab(*FAIL){$n1}|x/ xor print "[$n1][$&]" for 0, 1'
Useless use of logical xor in void context at -e line 1.
[0][ab]
Use of uninitialized value $& in concatenation (.) or string at -e line 1.
[1][]
Same with other parenthesized zero-width assertions (5.38.2, 5.36.0, 5.32.0), e.g.:
perl -wle '$n1 = $_, "bb" =~ m/(?<=b){$n1}b+/, print "[$n1][$&]" for 0, 1'
Quantifier unexpected on zero-length expression in regex m/(?<=b){0}b+/ at -e line 1.
[0][bb]
Quantifier unexpected on zero-length expression in regex m/(?<=b){1}b+/ at -e line 1.
[1][b]
Non-parenthesized zero-width assertions with quantifier.
m/\A{1}b/
failed to match in perl-5.32.0, but in 5.36, 5.38 it works fine:
perlbrew exec perl -wle '$n1 = $_, "b" =~ m/\A{$n1}b/, print "[$n1][$&]" for 0, 1'
perl-5.38.2, perl-5.36.0
==========
Quantifier unexpected on zero-length expression in regex m/\A{0}b/ at -e line 1.
[0][b]
Quantifier unexpected on zero-length expression in regex m/\A{1}b/ at -e line 1.
[1][b]
perl-5.32.0
==========
Quantifier unexpected on zero-length expression in regex m/\A{0}b/ at -e line 1.
[0][b]
Quantifier unexpected on zero-length expression in regex m/\A{1}b/ at -e line 1.
Use of uninitialized value $& in concatenation (.) or string at -e line 1.
[1][]
...But ^
instead of \A
worked fine.
Similarly \z
and \Z
in "b" =~ m/b\z{1}/
didn't match in perl-5.32.0.
Then how to read ${1}
?
A program interprets it as variable $1
. However under /x
, $ {1}
is interpreted as $
and its quantifier, which works fine. E.g.:
perlbrew exec perl -wle '$n1 = $_, "ba" =~ m/b$ {$n1}/x, print "[$n1][$&]" for 0, 1'
perl-5.38.2, perl-5.36.0, perl-5.32.0
==========
Quantifier unexpected on zero-length expression in regex m/b$ {0}/ at -e line 1.
[0][b]
Quantifier unexpected on zero-length expression in regex m/b$ {1}/ at -e line 1.
Use of uninitialized value $& in concatenation (.) or string at -e line 1.
[1][]
An example with \K
(works unexpected until perl-5.32.0), e.g.:
perlbrew exec perl -wle '$n1 = $_, "ba" =~ m/b\K{0,$n1}a/, print "[$n1][$&]" for 0, 1'
perl-5.38.2, perl-5.36.0
==========
Quantifier unexpected on zero-length expression in regex m/b\K{0,0}a/ at -e line 1.
[0][ba]
[1][a]
perl-5.32.0
==========
Quantifier unexpected on zero-length expression in regex m/b\K{0,0}a/ at -e line 1.
[0][ba]
[1][ba]
Expected behavior
Warning is generated, but in some cases it may be redundant.
This seems to report two issues in the same ticket. How the regex engine handles /$ {1}/x and how perl does generally are subtly different. Perl is fairly forgiving of spaces between the sigil and the name, the regex engine is not, as $ has meaning beyond that of a variable sigil.
The other parts I would have to dig further. I am not sure I agree with the premise that the way to enable/disable a construct is to use a {0} or {1} quantifier. I guess you could make an argument that when the quantifier is 0 or 1 we should not warn, but I would not consider it a bug if we simply said "do not put quantiiers on zero width assertions". Off the top of my head IMO The correct way to do a conditional subexpression in a regex pattern is to use the (?(..)YES|NO) construct.