comby
comby copied to clipboard
Ignore binaries and honor .gitignore
Is your feature request related to a problem? Please describe. I'm running comby on a big project with lots of build files stored in the same tree as the git repository (ignored by .gitignore). Additionally we also have a lot of binary files actually committed to the repository. The build files are autogenerated sources as well as binary build products.
Describe the solution you'd like It would be great to have comby ignore all files ignored by git and also binary files. One reason I use ripgrep is because it does exactly this, which makes it orders of magnitude faster than simple grep. See: https://github.com/BurntSushi/ripgrep/blob/master/GUIDE.md#automatic-filtering
Additional context Luckily there was a timeout message that made me aware it's searching binaries:
> ./comby 'failUnlessEqual(:[a],:[b])' 'assertEqual(:[a],:[b])'
Timeout for input: Path: /home/zoid/cloudhome/hpe/taiwan/edk2/comby!
Timeout for input: Path: /home/zoid/cloudhome/hpe/taiwan/edk2/Build/FreedomU540HiFiveUnleashed/DEBUG_GCC5/FV/U540.fd!
Timeout for input: Path: /home/zoid/cloudhome/hpe/taiwan/edk2/Build/FreedomU540HiFiveUnleashed/DEBUG_GCC5/FV/DXEFV.Fv!
Timeout for input: Path: /home/zoid/cloudhome/hpe/taiwan/edk2/Build/FreedomU540HiFiveUnleashed/DEBUG_GCC5/FV/Ffs/9E21FD93-9C72-4c15-8C4B-E77F1DB2D792FVMAIN_COMPACT/9E21FD93-9C72-4c15-8C4B-E77F1DB2D792SEC1.guided.dummy!
Timeout for input: Path: /home/zoid/cloudhome/hpe/taiwan/edk2/Build/FreedomU540HiFiveUnleashed/DEBUG_GCC5/FV/Ffs/9E21FD93-9C72-4c15-8C4B-E77F1DB2D792FVMAIN_COMPACT/9E21FD93-9C72-4c15-8C4B-E77F1DB2D792SEC1.1fv.sec!
Timeout for input: Path: /home/zoid/cloudhome/hpe/taiwan/edk2/Build/FreedomU540HiFiveUnleashed/DEBUG_GCC5/RISCV64/DxeCore.debug!
Do I understand it correctly that it also means that a comby process dies and files that would have been subsequently processed by it are ignored? Because I see this behavior and the same kind of timeouts.
Do I understand it correctly that it also means that a comby process dies and files that would have been subsequently processed by it are ignored?
It shouldn't be that the process dies. Certainly if a file times out it is skipped.
I'll look into addressing this soon, in the mean time, interim solution you might find useful:
You can use a -rg
flag with comby
where ripgrep will preprocess the files to search based on some criteria. This means that ripgrep could to some extent currently do the job of honoring the .gitignore
file and skipping binary files. You must have ripgrep
i.e., rg
installed on your PATH
. Here are some example invocations:
-
comby 'change(:[1])' 'me' -matcher .go -rg ""
- Underneath the hood,
rg
is invoked to find files that that will match the input pattern, and honor.gitignore
and binary ignores
- Underneath the hood,
-
comby 'change(:[1])' 'me' -matcher .go -rg "-g '*.go'"
- Underneath the hood,
rg
is invoked to match files that honor the-g '*.go'
glob pattern. The inside single quotes are significant here. Be aware that adding-g
inrg
effectively turns off honoring the.gitignore
.
- Underneath the hood,
LIMITATIONS
- You cannot invoke something like
comby 'change(:[1])' 'me' custom.txt -matcher .go -rg ""
and expect onlycustom.txt
files to match, therg
invocation takes precedence and will match all candidate files. To only match acustom.txt
, it is best to specify a file match pattern that expresses this.
Note that even rg
will not honor .gitignore
files when an explicit file pattern is provided. I.e., rg
will not honor the .gitignore
in the following cases, it is turned off:
-
rg -g '*.txt' pattern
- this will search all*.txt
files, even if they are listed in the.gitignore
file -
rg find-me *
- this will search all files, including those listed in the.gitignore
file.
In this sense, comby is the same, and will likely stay the same. The part that I will add is that the .gitignore
is honored in absence of an explicit file path pattern. And ignoring the binary files :-)
Thank you @rvantonder for the prompt reply – the -rg ""
options works well – things are fast and there are no timeouts.
But now as I tested it, I see the same outcome - still just one arbitrary & non-deterministic file gets replaced (instead of 8).
It shouldn't be that the process dies. Certainly if a file times out it is skipped.
So it seems like what you said 👆 holds – the skipped files and warnings about timeouts are unrelated.
Since the missed files are not related to the original problem reported in this issue, let's follow up on it offline?
For others who may come across this issue – my problem was totally unrelated to binary file timeouts, and was solved by providing the -matcher .txt
option (as I wanted to do a simple literal replacement)
https://github.com/comby-tools/comby/issues/237#issuecomment-821468963 this method works exceptionally well. before this any code search spits like ~100 lines of timeout errors and takes several minutes to complete. Now my search time down from several minutes to just a second. finally I get blazingly fast comby even on a three million LoC codebase! ⚡ ⚡ ⚡
from the bottom of my heart: It should be documented! ✍️