comby icon indicating copy to clipboard operation
comby copied to clipboard

Ignore binaries and honor .gitignore

Open JohnAZoidberg opened this issue 4 years ago • 5 comments

Is your feature request related to a problem? Please describe. I'm running comby on a big project with lots of build files stored in the same tree as the git repository (ignored by .gitignore). Additionally we also have a lot of binary files actually committed to the repository. The build files are autogenerated sources as well as binary build products.

Describe the solution you'd like It would be great to have comby ignore all files ignored by git and also binary files. One reason I use ripgrep is because it does exactly this, which makes it orders of magnitude faster than simple grep. See: https://github.com/BurntSushi/ripgrep/blob/master/GUIDE.md#automatic-filtering

Additional context Luckily there was a timeout message that made me aware it's searching binaries:

> ./comby  'failUnlessEqual(:[a],:[b])' 'assertEqual(:[a],:[b])'
Timeout for input: Path: /home/zoid/cloudhome/hpe/taiwan/edk2/comby!
Timeout for input: Path: /home/zoid/cloudhome/hpe/taiwan/edk2/Build/FreedomU540HiFiveUnleashed/DEBUG_GCC5/FV/U540.fd!
Timeout for input: Path: /home/zoid/cloudhome/hpe/taiwan/edk2/Build/FreedomU540HiFiveUnleashed/DEBUG_GCC5/FV/DXEFV.Fv!
Timeout for input: Path: /home/zoid/cloudhome/hpe/taiwan/edk2/Build/FreedomU540HiFiveUnleashed/DEBUG_GCC5/FV/Ffs/9E21FD93-9C72-4c15-8C4B-E77F1DB2D792FVMAIN_COMPACT/9E21FD93-9C72-4c15-8C4B-E77F1DB2D792SEC1.guided.dummy!
Timeout for input: Path: /home/zoid/cloudhome/hpe/taiwan/edk2/Build/FreedomU540HiFiveUnleashed/DEBUG_GCC5/FV/Ffs/9E21FD93-9C72-4c15-8C4B-E77F1DB2D792FVMAIN_COMPACT/9E21FD93-9C72-4c15-8C4B-E77F1DB2D792SEC1.1fv.sec!
Timeout for input: Path: /home/zoid/cloudhome/hpe/taiwan/edk2/Build/FreedomU540HiFiveUnleashed/DEBUG_GCC5/RISCV64/DxeCore.debug!

JohnAZoidberg avatar Feb 14 '21 09:02 JohnAZoidberg

Do I understand it correctly that it also means that a comby process dies and files that would have been subsequently processed by it are ignored? Because I see this behavior and the same kind of timeouts.

kkom avatar Apr 16 '21 15:04 kkom

Do I understand it correctly that it also means that a comby process dies and files that would have been subsequently processed by it are ignored?

It shouldn't be that the process dies. Certainly if a file times out it is skipped.

I'll look into addressing this soon, in the mean time, interim solution you might find useful:

You can use a -rg flag with comby where ripgrep will preprocess the files to search based on some criteria. This means that ripgrep could to some extent currently do the job of honoring the .gitignore file and skipping binary files. You must have ripgrep i.e., rg installed on your PATH. Here are some example invocations:

  • comby 'change(:[1])' 'me' -matcher .go -rg ""

    • Underneath the hood, rg is invoked to find files that that will match the input pattern, and honor .gitignore and binary ignores
  • comby 'change(:[1])' 'me' -matcher .go -rg "-g '*.go'"

    • Underneath the hood, rg is invoked to match files that honor the -g '*.go' glob pattern. The inside single quotes are significant here. Be aware that adding -g in rg effectively turns off honoring the .gitignore.

LIMITATIONS

  • You cannot invoke something like comby 'change(:[1])' 'me' custom.txt -matcher .go -rg "" and expect only custom.txt files to match, the rg invocation takes precedence and will match all candidate files. To only match a custom.txt, it is best to specify a file match pattern that expresses this.

Note that even rg will not honor .gitignore files when an explicit file pattern is provided. I.e., rg will not honor the .gitignore in the following cases, it is turned off:

  • rg -g '*.txt' pattern - this will search all *.txt files, even if they are listed in the .gitignore file
  • rg find-me * - this will search all files, including those listed in the .gitignore file.

In this sense, comby is the same, and will likely stay the same. The part that I will add is that the .gitignore is honored in absence of an explicit file path pattern. And ignoring the binary files :-)

rvantonder avatar Apr 16 '21 18:04 rvantonder

Thank you @rvantonder for the prompt reply – the -rg "" options works well – things are fast and there are no timeouts.

But now as I tested it, I see the same outcome - still just one arbitrary & non-deterministic file gets replaced (instead of 8).

It shouldn't be that the process dies. Certainly if a file times out it is skipped.

So it seems like what you said 👆 holds – the skipped files and warnings about timeouts are unrelated.

Since the missed files are not related to the original problem reported in this issue, let's follow up on it offline?

kkom avatar Apr 16 '21 20:04 kkom

For others who may come across this issue – my problem was totally unrelated to binary file timeouts, and was solved by providing the -matcher .txt option (as I wanted to do a simple literal replacement)

kkom avatar Apr 16 '21 20:04 kkom

https://github.com/comby-tools/comby/issues/237#issuecomment-821468963 this method works exceptionally well. before this any code search spits like ~100 lines of timeout errors and takes several minutes to complete. Now my search time down from several minutes to just a second. finally I get blazingly fast comby even on a three million LoC codebase! ⚡ ⚡ ⚡

from the bottom of my heart: It should be documented! ✍️

Ray-Eldath avatar Sep 04 '23 10:09 Ray-Eldath