ack3 icon indicating copy to clipboard operation
ack3 copied to clipboard

Allow regexes to support \Q (and friends)

Open rkleemann opened this issue 4 years ago • 8 comments

I've run into an issue where I could use \Q support, but when trying to use it I get the follow error:

$ ack '\s+\Q'"$MYVAR"'\E\s+'
ack: Invalid regex '\s+\QFoo!\E\s+'
Regex: \s+\QFoo!\E\s+
           ^---HERE Unrecognized escape \Q passed through in regex

In this situation, -Q will not work, as I need some regex around $MYVAR to get exactly what I'm looking for.

rkleemann avatar Oct 22 '20 17:10 rkleemann

As a workaround, I've used another call to Perl to do what I wanted:

$ack '\s+'$( perl -e 'print quotemeta shift' "$MYVAR" )'\s+'

But that's definitely a hack.

rkleemann avatar Oct 22 '20 17:10 rkleemann

PCRE allows \Q and \E

https://www.pcre.org/original/doc/html/pcrepattern.html

petdance avatar Oct 22 '20 18:10 petdance

Implementation note: If we allow \Q and \E we will have to update is_lowercase as well.

petdance avatar Oct 22 '20 18:10 petdance

It appears that this might be easier said than done, as \Q (and friends) are done at compile time, as mentioned in perlop: https://perldoc.perl.org/perlop#Gory-details-of-parsing-quoted-constructs

But in the section "parsing regular expressions", mentions the following (emphasis mine):

Previous steps were performed during the compilation of Perl code, but this one happens at run time

rkleemann avatar Oct 22 '20 18:10 rkleemann

That compile-time vs. run-time explains everything. I wonder if doing the qr// in an eval block would be "compile-time" enough. (Not that we want to use eval here after we worked so hard to get rid of it after ack 2)

petdance avatar Oct 22 '20 19:10 petdance

qr// is compile time for RE things but not for qq() interpolation things, which are Perl language and effectively pre-processor for qr// compile.

\Q\E kinda cross the streams because they protect RE languag chars at Perl-lang qq compile time.

Not supporting \Q\E found at qr// compile time is arguably a spec-level bug in Perl, as it fails to DWIM.

PCRE chooses to treat \Q\E as if part of RE-lang as a convenience to embedding languages since even if they have something like \Q\E theirs won't be clued to what PCRE needs quoted.

n1vux avatar Oct 22 '20 20:10 n1vux

OP Bob provides minimalist test case

$ perl -wE 'my $re = q!\Qfoo\E!; say "foo" =~ $re;'
Unrecognized escape \Q passed through in regex; marked by <-- HERE in m/\Q <-- HERE foo\E/ at -e line 1.
Unrecognized escape \E passed through in regex; marked by <-- HERE in m/\Qfoo\E <-- HERE / at -e line 1.

-w aha. ack forces that by escalating warnings to die, and catching die. (Which Andy very nicely then reformats to be readable and useful.)

n1vux avatar Oct 22 '20 22:10 n1vux

That the RE-special-char-understanding \Q\E escapes are being executed at the same Perl-lang compile (eval "" but not eval {} time) as \L\F\U and not in RE-lang compile (qr{}) is arguably a layering fudge in the Perl5 spec & implementation.

It would be a DWIM / Principle of Least Surprise feature for Ack to make at least \Q\E work as if we were doing eval "" (which we of course eschew for security reasons), and if we're going to do that, we might as well DWIM \L\F\U as well.

It would be ironic for Ack to do this to make Ack and Perl more PCRE compatible :-D but if it's the right thing.

n1vux avatar Oct 22 '20 22:10 n1vux