perl5 icon indicating copy to clipboard operation
perl5 copied to clipboard

"Quantifier unexpected on zero-length expression in regex" when using quantifier as a switch

Open rsFalse opened this issue 1 year ago • 1 comments

Description

The possible way to control regex constructs, e.g. switch them ON or OFF, is to use quantifier {n}, with n = 0 or 1; When controlling zero-length constructs, only n = 0 and 1 make sense. n = 0 should switch OFF particular construct. The warning "Quantifier unexpected on zero-length expression in regex" in some cases is possibly redundant, although it may be useful in most of other cases. Behavior in rare cases was unexpected (at least until perl-5.32).

Steps to Reproduce

Non-necessary warning "Quantifier unexpected...". Warning is not generated (5.38.2, 5.36.0, 5.32.0):

perl -wle '( $n1, $n2 ) = ( split "" ), "aba" =~ m/ab(*FAIL){$n1}|a(*FAIL){$n2}|ba/ and print "[$n1][$n2][$&]" for glob "{0,1}" x 2'
[0][0][ab]
[0][1][ab]
[1][0][a]
[1][1][ba]

perl-5.20.1 - same output, but with 4x4 similar warnings:

Quantifier unexpected on zero-length expression in regex; marked by <-- HERE in m/ab(*FAIL){0}|a(*FAIL){0}|ba <-- HERE / at -e line 1.

But somehow this warning appears in (5.38.2, 5.36.0, 5.32.0), when I leave only 1 branch out of 3:

perl -wle '$n1 = $_, "aba" =~ m/ab(*FAIL){$n1}/ xor print "[$n1][$&]" for 0, 1'
Useless use of logical xor in void context at -e line 1.
Quantifier unexpected on zero-length expression in regex m/ab(*FAIL){0}/ at -e line 1.
[0][ab]
Quantifier unexpected on zero-length expression in regex m/ab(*FAIL){1}/ at -e line 1.
Use of uninitialized value $& in concatenation (.) or string at -e line 1.
[1][]

I.e. no warning when I add a branch, e.g. |x:

perl -wle '$n1 = $_, "aba" =~ m/ab(*FAIL){$n1}|x/ xor print "[$n1][$&]" for 0, 1'
Useless use of logical xor in void context at -e line 1.
[0][ab]
Use of uninitialized value $& in concatenation (.) or string at -e line 1.
[1][]

Same with other parenthesized zero-width assertions (5.38.2, 5.36.0, 5.32.0), e.g.:

perl -wle '$n1 = $_, "bb" =~ m/(?<=b){$n1}b+/, print "[$n1][$&]" for 0, 1'
Quantifier unexpected on zero-length expression in regex m/(?<=b){0}b+/ at -e line 1.
[0][bb]
Quantifier unexpected on zero-length expression in regex m/(?<=b){1}b+/ at -e line 1.
[1][b]

Non-parenthesized zero-width assertions with quantifier. m/\A{1}b/ failed to match in perl-5.32.0, but in 5.36, 5.38 it works fine:

perlbrew exec perl -wle '$n1 = $_, "b" =~ m/\A{$n1}b/, print "[$n1][$&]" for 0, 1'
perl-5.38.2, perl-5.36.0
==========
Quantifier unexpected on zero-length expression in regex m/\A{0}b/ at -e line 1.
[0][b]
Quantifier unexpected on zero-length expression in regex m/\A{1}b/ at -e line 1.
[1][b]
perl-5.32.0
==========
Quantifier unexpected on zero-length expression in regex m/\A{0}b/ at -e line 1.
[0][b]
Quantifier unexpected on zero-length expression in regex m/\A{1}b/ at -e line 1.
Use of uninitialized value $& in concatenation (.) or string at -e line 1.
[1][]

...But ^ instead of \A worked fine.

Similarly \z and \Z in "b" =~ m/b\z{1}/ didn't match in perl-5.32.0.

Then how to read ${1}? A program interprets it as variable $1. However under /x, $ {1} is interpreted as $ and its quantifier, which works fine. E.g.:

perlbrew exec perl -wle '$n1 = $_, "ba" =~ m/b$ {$n1}/x, print "[$n1][$&]" for 0, 1'
perl-5.38.2, perl-5.36.0, perl-5.32.0
==========
Quantifier unexpected on zero-length expression in regex m/b$ {0}/ at -e line 1.
[0][b]
Quantifier unexpected on zero-length expression in regex m/b$ {1}/ at -e line 1.
Use of uninitialized value $& in concatenation (.) or string at -e line 1.
[1][]

An example with \K (works unexpected until perl-5.32.0), e.g.:

perlbrew exec perl -wle '$n1 = $_, "ba" =~ m/b\K{0,$n1}a/, print "[$n1][$&]" for 0, 1'
perl-5.38.2, perl-5.36.0
==========
Quantifier unexpected on zero-length expression in regex m/b\K{0,0}a/ at -e line 1.
[0][ba]
[1][a]
perl-5.32.0
==========
Quantifier unexpected on zero-length expression in regex m/b\K{0,0}a/ at -e line 1.
[0][ba]
[1][ba]

Expected behavior

Warning is generated, but in some cases it may be redundant.

rsFalse avatar Jan 10 '24 15:01 rsFalse

This seems to report two issues in the same ticket. How the regex engine handles /$ {1}/x and how perl does generally are subtly different. Perl is fairly forgiving of spaces between the sigil and the name, the regex engine is not, as $ has meaning beyond that of a variable sigil.

The other parts I would have to dig further. I am not sure I agree with the premise that the way to enable/disable a construct is to use a {0} or {1} quantifier. I guess you could make an argument that when the quantifier is 0 or 1 we should not warn, but I would not consider it a bug if we simply said "do not put quantiiers on zero width assertions". Off the top of my head IMO The correct way to do a conditional subexpression in a regex pattern is to use the (?(..)YES|NO) construct.

demerphq avatar Jan 11 '24 14:01 demerphq