fd icon indicating copy to clipboard operation
fd copied to clipboard

Finding multiple patterns

Open tjkirch opened this issue 7 years ago • 31 comments

I'd like to be able to search for multiple patterns, like with grep's -e argument. It seems (with fd 7.0.0) the only way is to use alternation in the regex pattern, but this can be less clear than multiple arguments, and is harder to build up programmatically.

tjkirch avatar Jul 31 '18 16:07 tjkirch

Thank you for the feedback!

I certainly see the need for this, but I'm not sure we should introduce a new command-line argument, given that there is a reasonable solution via fd '(pattern1|pattern2)'. On the other hand, the analogy to grep would be nice. Unfortunately, -e is already taken for --extension.

Another option to achieve something like this could be the --path-before-pattern flag that we were discussing over in #312. This would allow us to use fd --path-before-pattern . pattern1 pattern2 ... (possibly with a shortcut for the flag).

sharkdp avatar Aug 03 '18 18:08 sharkdp

Actually, the --path-before-pattern doesn't feel very natural for me. I'd be okay with adding a new --regexp <PATTERN> option in analogy to grep/rg, if someone wants to work on this.

sharkdp avatar Oct 27 '18 14:10 sharkdp

I'm currently not planning to implement this. Going to close this for now, but happy to reconsider if there is a significant interest in this.

sharkdp avatar Apr 02 '20 18:04 sharkdp

How about

fd Makefile --or GNuMakefile --or make

?

That reads naturally, and it would make it possible to add a git grep or find style boolean query language at some point.

aschmolck avatar Feb 24 '21 15:02 aschmolck

Let's reopen this for further discussion.

sharkdp avatar Mar 14 '21 19:03 sharkdp

Here is a concrete example I just did with find, I think it would be nice to be able to do the same thing with fd as well:

find . -type d -and \( -name node_modules -or -name build \) -exec rm -rf '{}' '+'

aschmolck avatar Mar 17 '21 12:03 aschmolck

I'm definitely against including a full-blown query language with --and/--or. fd was never designed to be this powerful. It's focused on easier use-cases.

Your use-case can be solved by running

fd -td '^(node_modules|build)$' -X rm -rf

or

fd -td node_modules -X rm -rf
fd -td build -X rm -rf

Both of which are shorter than the find equivalent (which is not the main issue here though).

sharkdp avatar May 16 '21 20:05 sharkdp

I'm definitely against including a full-blown query language with --and/--or. fd was never designed to be this powerful. It's focused on easier use-cases.

Your use-case can be solved by running

fd -td '^(node_modules|build)$' -X rm -rf

or

fd -td node_modules -X rm -rf
fd -td build -X rm -rf

Both of which are shorter than the find equivalent (which is not the main issue here though).

But, maybe we need a non-regexp OR pattern, following is a example, i guess is not so simple to do with fd.

find . \
                -name "* (????-??-??) \[??:??:??\].tar" -o \
                -name "* (????-??-??) \[??:??:??\].bak"

Can we support like this:

fd -IH -g '* (????-??-??) [??:??:??].tar' -g '* (????-??-??) [??:??:??].bak'

zw963 avatar Jul 08 '21 08:07 zw963

OR is actually pretty easy to do with regexes.

your example could be done with:

fd -IH '.* \(....-..-..\) \[..:..:..\]\.(tar|bak)'

AND is more difficult.

tmccombs avatar Jul 09 '21 04:07 tmccombs

OR is actually pretty easy to do with regexes.

your example could be done with:

fd -IH '.* \(....-..-..\) \[..:..:..\]\.(tar|bak)'

AND is more difficult.

Yes, i done this like this:

fd -HI '.*(\d{4}-\d{2}-\d{2}) [\d{2}:\d{2}:\d{2}].(tar|bak)'

I think it more obscurely then find or solution anyway.

zw963 avatar Jul 09 '21 05:07 zw963

@sharkdp I have a use case where it would be very useful if fd supported multiple patterns combined with AND. As you write in your https://github.com/sharkdp/fd/issues/315#issuecomment-841869872, you don't want to add a full query language here which is understandable. The combination of multiple regular expressions with OR is no problem. However there is no possibility to search for multiple patterns combined with AND. The reason is also that the Rust regular expression engine does not support lookahead patterns, otherwise one could write ^(?=.*first)(?=.*second) to search for file names with both first and second in the name. Would you accept a PR which adds support for searching multiple patterns combined with AND?

minad avatar Aug 07 '21 12:08 minad

To be honest, I haven't really seen a reasonable use case for AND so far. Please let me know if there are any. Not a theoretical use case. A real world, practical use case.

sharkdp avatar Oct 08 '21 21:10 sharkdp

@sharkdp commented on Oct 9, 2021, 12:36 AM GMT+3:30:

To be honest, I haven't really seen a reasonable use case for AND so far. Please let me know if there are any. Not a theoretical use case. A real world, practical use case.

Searching for terms where one doesn't know their order. This happens frequently for me; E.g.,games AND windows, as I sometimes have games/Windows, and sometimes Windows/games.

NightMachinery avatar Oct 08 '21 21:10 NightMachinery

@sharkdp I have an Emacs file finder frontend which can use find or fd as backend. This frontend supports a matching style we call "orderless" matching, where you enter multiple words/regexps separated by space. Each of the file paths should match all of these regexps. Currently one can achieve this by transforming the regular expressions "word1.*word2|word2.*word1", which obviously does not scale well. Another alternative for AND filtering is to use pipes and run fd first and then grep for the remaining regexps (or instead of grep post-filter in the frontend), but then one loses the performance advantages of fd. The "orderless" style matching is quite popular in Emacs to quickly filter a set of candidates, since as @NightMachinary mentioned, the huge advantage is that the user does not have to know the order of the words/regexps. If this is a reasonable use case depends on your judgement of course. It seems to me that fd aims more at shell users. But I often get the request to support fd in the Emacs frontend by users who prefer fd instead of find for performance reasons.

minad avatar Oct 08 '21 21:10 minad

Ok, I'm inclined to accept a feature request to support --and <pattern>. Before we implement this, we need:

  • A short discussion about the command-line option name. What do other tools use?
  • A detailed analysis if this could clash with any of the other command-line options or features of fd. There are some immediate questions like: what does fd patternA --and patternB --type f mean? (we are not going to support the meaning patternA AND (patternB AND type==file)).

sharkdp avatar Nov 14 '21 16:11 sharkdp

In fact, i thought most of discuss in this thread is about --or, that means, we can search multi-pattern at one command line more easiler.

zw963 avatar Nov 15 '21 03:11 zw963

Note that there is also #650 and #714. Also, --or can usually be worked around easily.

sharkdp avatar Nov 15 '21 06:11 sharkdp

I propose we can add --or for now, and let discuss the usage and necessity of --and.

zw963 avatar Nov 15 '21 13:11 zw963

--or isn't really necessary, because you can just use | in the pattern to combine multiple patterns. However, there isn't a good way to express --and with a single regex.

To be concrete, a hypothetical fd foo --or bar would be equivalent to fd 'foo|bar'. Whereas fd foo --and bar would need to be converted to something like fd 'foo.*bar|bar.*foo' which scales really poorly.

tmccombs avatar Nov 20 '21 07:11 tmccombs

To be concrete, a hypothetical fd foo --or bar would be equivalent to fd 'foo|bar'

not equivalent.

Because we can use --or with glob-based search

zw963 avatar Nov 20 '21 14:11 zw963

It's equivalent in the sense that every glob can be converted to a regex

tavianator avatar Nov 20 '21 16:11 tavianator

It's equivalent in the sense that every glob can be converted to a regex

But in most simple case, glob-based search is more simple than regexp on keystroke

zw963 avatar Nov 21 '21 15:11 zw963

If fd gets both --or and --and then it should also get --not and parens (users would certainly demand it). We would arrive at something similar to find in terms of complexity.

My understanding is that fd tries to be simpler than find (but at the same time as powerful as feasible). In that sense, I think it's not too much to ask the advanced user who needs --or to simply use regular expressions.

On the other hand, there is really no practical way to work around the lack of --and. Someone who wants to search the file system for three different tags in arbitrary order will have to run fd with a regular expression that combines the six possible permutations in one giant regex. (I wrote a wrapper script that allows me to run fd like this easily and I consider it extremely useful.)

In #889, I suggested that one could deprecate the specification of paths as arguments (as opposed to --search-path that I suggest to rename to --root and -r for brevity). This would eventually allow to specify multiple search patterns as args. Given that logical OR is already possible within a regex, it would make sense to apply logical AND when multiple patterns are given.

IMHO fd would thus gain a much nicer (cleaner and more powerful) UI.

grothesque avatar Nov 22 '21 09:11 grothesque

This might be off-topic since it's not strictly about patterns per se, but here's a real-world use case for --or that can't be done via a regex:

I have some complex Bash projects with several different types of files (executable scripts, helpers, test modules, etc.) and I want to lint them all at once with shellcheck. I can't use plain globs because some files have no extensions and shellcheck will error when passed folder names.

This is what I'd like to do:

fd -t x --or -e bash --or -e bats -0 | xargs -0 -- shellcheck

This can be done with find, but without the benefits of automatic VCS exclusions:

find \( -type f -and -executable -or -name '*.bash' -or -name '*.bats' \) -print0 | xargs -0 -- shellcheck

cheap-glitch avatar Jan 22 '22 10:01 cheap-glitch

fd -t x --or -e bash --or -e bats -0 | xargs -0 -- shellcheck

You can already do this. -e already combines in a or-sense. In addition, you can use fds --exec/-x option instead of xargs. This will not be just shorter to write, but also faster, because it runs multiple shellcheck processes in parallel:

fd -tx -ebash -ebats -x shellcheck

sharkdp avatar Jan 22 '22 12:01 sharkdp

You can already do this. -e already combines in a or-sense

Yes, but -tx doesn't. To clarify, I want all the files that are executable OR end in .bash/.bats.

cheap-glitch avatar Jan 22 '22 13:01 cheap-glitch

I think that kind of functionality is out of scope for fd, it would basically involve making an expression language similar to what find has, and make fd significantly more complicated.

tmccombs avatar Jan 22 '22 18:01 tmccombs

I think that kind of functionality is out of scope for fd, it would basically involve making an expression language similar to what find has, and make fd significantly more complicated.

I totally understand not wanting to add that kind of complexity, but what about a simple global flag? (Sorry if this has already been proposed and rejected somewhere else).

It could be called e.g. --combine-with and take 3 possible values:

  • and to combine all filters with a logical AND
  • or to combine all filters with OR
  • auto to use the default "smart" combination logic (so the same as not passing the option at all)

This would probably be easier to implement, and while not as flexible as find's expressions, it would still enable more use cases.

cheap-glitch avatar Jan 22 '22 19:01 cheap-glitch

Thank you for your feedback, but I'm not a fan of the --combine-with idea. I'm not sure if that would really allow us to solve a lot of new real world use cases.

What would fd --combine-with=or -e txt -e md README do? Would it OR-combine ALL criteria? Including the pattern? So it would search for files with a txt extension, with a md extension OR for files matching README?


Another workaround for the OR use case is to simply use multiple fd commands:

(fd -t x -0; fd -e bash -e bats -0) | xargs -0 -- shellcheck

sharkdp avatar Jan 23 '22 12:01 sharkdp

@sharkdp,

Also, --or can usually be worked around easily

Is there any way to search for directories, or files that match specific pattern?

If we search for ALL the files and directories, then, yes, fd . --type d --type f ~/Documents can do it. But if we want to get a list of all the directories AND all the .txt files, then, as soon as we add --extension, like fd . --type d --type f --extension txt ~/Documents, fd, as expected, will limit the results to files only. Same happens if we add --full-path, like this: fd --type d --type f --full-path '.*txt$' ~/Documents.

Of course, combining two different searches into one stream is not a problem. But why spawn two instances? :)

097115 avatar Jun 02 '22 12:06 097115

I would like to reinforce the case for an AND operator as opposed to a full implementation of boolean logic (see my above comment):

I wrote a script (https://gitlab.kwant-project.org/-/snippets/903, consider it in the public domain) that uses fd as a backend to search for files/directories matching a combination of tags. The tags of each file/directory are obtained are obtained from the path by treating slashes and dashes as separators. For example, the file name “pers/2022/bike-repair.org” corresponds to the tags “pers”, “2022”, “bike”, “repair”, as well as “repair.org” (dots are optional tag separators).

Now searching for all events involving my friend “Bob” and the activity “climbing” is as quick as running ff bob climbing. (I like to define a short ff alias.) I also have a way to run this directly from within Emacs.

The purpose of this example is not to convince you to organize your home directory in a similar way (although I think that the scheme works very well), but to give one very concrete usage example of fd use where having a way to express an AND relation would be useful.

My script has a --debug option that instead of running fd will just print out the command. As one can imagine, the query length grows exponentially with the number of tags for which to search. Already with three tags it is getting pretty long (and presumably less efficient):

% fdfind-tags --debug a b c
fdfind --full-path --prune --regex '[-/](a)[-/](.*[-/])?(b)[-/](.*[-/])?(c)([-/]|(\.[^/]*)?$)|[-/](a)[-/](.*[-/])?(c)[-/](.*[-/])?(b)([-/]|(\.[^/]*)?$)|[-/](b)[-/](.*[-/])?(a)[-/](.*[-/])?(c)([-/]|(\.[^/]*)?$)|[-/](b)[-/](.*[-/])?(c)[-/](.*[-/])?(a)([-/]|(\.[^/]*)?$)|[-/](c)[-/](.*[-/])?(a)[-/](.*[-/])?(b)([-/]|(\.[^/]*)?$)|[-/](c)[-/](.*[-/])?(b)[-/](.*[-/])?(a)([-/]|(\.[^/]*)?$)'

grothesque avatar Aug 22 '22 14:08 grothesque