zed icon indicating copy to clipboard operation
zed copied to clipboard

Show filename when searching across multiple files with zq

Open philrz opened this issue 6 months ago • 0 comments

tl;dr

In a directory containing multiple files ending in .log, a user executes a search like:

zq -i line '"test"' *.log

Alongside each search result the user would like a way to also display the name of the file each result came from.

Details

Repro is with Zed commit c39086b.

This issue was originally surfaced in a community Slack thread. In the user's own words:

Is there a way when searching across a glob pattern for multiple files in a directory such as *.log to have the file the search result came from also listed? For example zq -i line '"test"' *.log

similar to grep -rnwio . -e "test" which would list the file and the containing string. I had a thought that using from might have gotten me there but not the right usage.

@mattnibs acknowledged to the user that we don't currently have a way of doing this and for now recommended using tools at the the shell to bridge the gap. So for example, using the zed-sample-data, this shows the baseline problem:

$ zq -version
Version: v1.17.0-11-gc39086ba

$ zq -i line '"thinkwithgoogle"' *.log.gz
"1521912845.237311\t144c918fa2aca4461d3535a237d311cb5102c1919096e0fa9b73ab95af4876fc\t3\t08434F2704007BF2\tCN=*.appspot.com,O=Google Inc,L=Mountain View,ST=California,C=US\tCN=Google Internet Authority G3,O=Google Trust Services,C=US\t1520451204.000000\t1527706320.000000\trsaEncryption\tsha256WithRSAEncryption\trsa\t2048\t65537\t-\t*.appspot.com,*.thinkwithgoogle.com,*.withgoogle.com,*.withyoutube.com,appspot.com,thinkwithgoogle.com,withgoogle.com,withyoutube.com\t-\t-\t-\tF\t-\tT\tF"

And here's the recommended approach from @mattnibs working as intended:

$ find . -name "*.log.gz"  | xargs -I {} zq -i line '"thinkwithgoogle" | {file:"{}",value:this}' {}
{file:"./x509.log.gz",value:"1521912845.237311\t144c918fa2aca4461d3535a237d311cb5102c1919096e0fa9b73ab95af4876fc\t3\t08434F2704007BF2\tCN=*.appspot.com,O=Google Inc,L=Mountain View,ST=California,C=US\tCN=Google Internet Authority G3,O=Google Trust Services,C=US\t1520451204.000000\t1527706320.000000\trsaEncryption\tsha256WithRSAEncryption\trsa\t2048\t65537\t-\t*.appspot.com,*.thinkwithgoogle.com,*.withgoogle.com,*.withyoutube.com,appspot.com,thinkwithgoogle.com,withgoogle.com,withyoutube.com\t-\t-\t-\tF\t-\tT\tF"}

The user confirmed this solution should be workable for now.

In terms of how we might address this more directly in the future, @mattnibs offered the following thoughts:

Maybe a solution would be to add globbing to the file source operator much as we do for pool sources then maybe have some flag that lets you put the source name/details on each value produced from the source.

When talking about decorating each value in the source maybe you have a -each flag that accepts a function where the first argument is the this value and the second is an info record describing the source and the result of the function would be the new value. So you could do something like:

func describe(value, info): (
 { value, info }
)
file -each=describe *.log

Discussing this reminded us all of another issue https://github.com/brimdata/zui/issues/2931 where a user asked about doing something similar with from * and wanting to see the name of the pool each result came from.

philrz avatar Aug 05 '24 20:08 philrz