zed icon indicating copy to clipboard operation
zed copied to clipboard

Glob wildcards around numeric target is a parse error

Open philrz opened this issue 4 years ago • 1 comments

Repro is with Zed commit 8e0219e. This issue was found by a community user.

I'd just shown them how a bare search matched both of the records in this input data.

$ zq -version
Version: v0.29.0-476-g8e0219e5

$ cat year.ndjson 
{"YEAR": 2002}
{"YEAR": "2002"}

$ zq -z '2002' year.ndjson 
{YEAR:2002}
{YEAR:"2002"}

I wanted to then explain how matches could be used to do a search across only fields named YEAR, but the all-numeric target was rejected by the parser.

$ zq -z 'YEAR matches *2002*' year.ndjson 
zq: error parsing Zed at column 15:
YEAR matches *2002*
          === ^ ===

@mccanne took a look at this and explained:

Yeah, a bug… the glob parsing logic is complicated so we can disambiguate with expressions, e.g., * as multiplication.  But I think we can take another crack at this since we simplified the grammar a little while back. In the meantime, this will work: YEAR matches /^.*2002.*$/

That does match against the string one.

$ zq -z 'YEAR matches /^.*2002.*$/' year.ndjson 
{YEAR:"2002"}

I've opened separate issue #2962 to discuss an operator that would make this match against the other record also.

philrz avatar Aug 26 '21 18:08 philrz

I happened to bump into this problem again, then found this issue was already open. Here's the example with current Zed commit 26dbda0 so you can see my journey.

I was doing a close read of the Globs section of the Language Overview doc:


image

To convince myself all those characters were truly legal, I did this successful search:

$ zq -version
Version: v1.2.0-20-g26dbda03

$ echo '"foo_.:/%#123@~bar"' | zq -z '*_.:/%#123@~*' -
"foo_.:/%#123@~bar"

However, if I shortened the glob pattern to just the numbers inside the wildcard, it fails.

$ echo '"foo_.:/%#123@~bar"' | zq -z '*123*' -
zq: error parsing Zed at column 2:
*123*
 ^ ===

And as @mccanne pointed out above, I can still get around it by turning it into a regexp.

$ echo '"foo_.:/%#123@~bar"' | zq -z '/^.*123.*$/' -
"foo_.:/%#123@~bar"

Before seeking out a bug, I also spent a fair amount of time scratching my head at that caveat from the docs:

Note that these rules do not allow for a leading digit.

...wondering if "leading digit" was meant to say that they weren't allowed even after a glob wildcard *. Now that I'm reminded this is a legit bug, I know that's not the case. But considering I got tripped despite having been in this spot before, it might be nice to fix this before users more bump into it. 😄

philrz avatar Jul 28 '22 15:07 philrz

Verified in Zed commit 418c024.

The example above no longer produces a parse error.

$ zq -version
Version: v1.2.0-86-g418c024d

$ echo '"foo_.:/%#123@~bar"' | zq -z '*123*' -
"foo_.:/%#123@~bar"

Also, the example in the opening text would now be accomplished with the grep() function, which also now works.

$ zq -z 'grep(*2002*, YEAR)' year.ndjson 
{YEAR:"2002"}

Thanks @nwt!

philrz avatar Oct 13 '22 18:10 philrz