Glob wildcards around numeric target is a parse error
Repro is with Zed commit 8e0219e. This issue was found by a community user.
I'd just shown them how a bare search matched both of the records in this input data.
$ zq -version
Version: v0.29.0-476-g8e0219e5
$ cat year.ndjson
{"YEAR": 2002}
{"YEAR": "2002"}
$ zq -z '2002' year.ndjson
{YEAR:2002}
{YEAR:"2002"}
I wanted to then explain how matches could be used to do a search across only fields named YEAR, but the all-numeric target was rejected by the parser.
$ zq -z 'YEAR matches *2002*' year.ndjson
zq: error parsing Zed at column 15:
YEAR matches *2002*
=== ^ ===
@mccanne took a look at this and explained:
Yeah, a bug… the glob parsing logic is complicated so we can disambiguate with expressions, e.g., * as multiplication. But I think we can take another crack at this since we simplified the grammar a little while back. In the meantime, this will work:
YEAR matches /^.*2002.*$/
That does match against the string one.
$ zq -z 'YEAR matches /^.*2002.*$/' year.ndjson
{YEAR:"2002"}
I've opened separate issue #2962 to discuss an operator that would make this match against the other record also.
I happened to bump into this problem again, then found this issue was already open. Here's the example with current Zed commit 26dbda0 so you can see my journey.
I was doing a close read of the Globs section of the Language Overview doc:
To convince myself all those characters were truly legal, I did this successful search:
$ zq -version
Version: v1.2.0-20-g26dbda03
$ echo '"foo_.:/%#123@~bar"' | zq -z '*_.:/%#123@~*' -
"foo_.:/%#123@~bar"
However, if I shortened the glob pattern to just the numbers inside the wildcard, it fails.
$ echo '"foo_.:/%#123@~bar"' | zq -z '*123*' -
zq: error parsing Zed at column 2:
*123*
^ ===
And as @mccanne pointed out above, I can still get around it by turning it into a regexp.
$ echo '"foo_.:/%#123@~bar"' | zq -z '/^.*123.*$/' -
"foo_.:/%#123@~bar"
Before seeking out a bug, I also spent a fair amount of time scratching my head at that caveat from the docs:
Note that these rules do not allow for a leading digit.
...wondering if "leading digit" was meant to say that they weren't allowed even after a glob wildcard *. Now that I'm reminded this is a legit bug, I know that's not the case. But considering I got tripped despite having been in this spot before, it might be nice to fix this before users more bump into it. 😄
Verified in Zed commit 418c024.
The example above no longer produces a parse error.
$ zq -version
Version: v1.2.0-86-g418c024d
$ echo '"foo_.:/%#123@~bar"' | zq -z '*123*' -
"foo_.:/%#123@~bar"
Also, the example in the opening text would now be accomplished with the grep() function, which also now works.
$ zq -z 'grep(*2002*, YEAR)' year.ndjson
{YEAR:"2002"}
Thanks @nwt!
