fd
fd copied to clipboard
`--exclude` doesn't work with absolute paths
Describe the bug you encountered:
The examples for the --exclude
or -E
option imply that it should work with absolute paths (/mnt/external-drive
is given as an example). However, it only seems to work with relative paths. For example, if I'm trying to exclude the directory /home/user/Library/
:
fd -E Library pattern /home
works, as does
fd -E "*/Library/*" pattern /home
However,
fd -E /home/user/Library/ pattern /home
doesn't work (i.e. /home/user/Library/pattern.txt
would still show up in search results ). Adding other options, such as -p
or -a
doesn't seem to affect this behavior.
The only way I've found to exclude absolute paths is to add them to ~/.config/fd/ignore
, which is sowewhat inconvenient.
What version of fd
are you using?
fd 8.2.1
Which operating system / distribution are you on?
Linux 5.14.2-arch1-2 x86_64
LSB Version: 1.4
Distributor ID: Arch
Description: Arch Linux
Release: rolling
Codename: n/a
Hi! I would like to work on this. :)
I'm gonna try to fix it and open a PR.
The issue here is as follows: the exclude option works the same way .gitignore patterns work. This means that an absolute path is relative to the root of the git repo (which is the first search path in our case).
To fix this, we can check which exclude options are absolute and filter the results after crate ignore
finds them. What do you think about this approach? @sharkdp
Just ran into this problem. Thanks for the tip, @alessandroasm. In my case I made the excluded paths relative to the root folder, and it worked. Perhaps the man page could be updated to note that the flag follows the same rules as ignore entries.
Especially bad in combination with --follow
, as ~/Library/Containers
(Mac) contains thousands of symlinks to directories like ~/Pictures
or ~/Music
, which themselves can have tens of thousands of files in it. Blows up search results a lot, >8x time and >14x result count for me:
~
❯ fd --exclude Containers --follow |wc -l
2657140
~ took 8s
❯ fd --follow |wc -l
38664223
~ took 1m10s
❯
I don't really want to add plain Containers
into my global ignore, as it's a name that may be used outside ~/Library
(for, well, containers for example), which should not be excluded.
My current approach, for everyone wanting sth similar, is this rather granular global ignore, which allows me to find files living in these containers (sandboxed apps' documents) while not blowing up completely:
# Source:
~/Library/Containers
❯ fd --type symlink |cut -d '/' -f 4 |sort |uniq
# $XDG_CONFIG_HOME/fd/ignore
Library/Containers/*/Data/Desktop
Library/Containers/*/Data/Downloads
Library/Containers/*/Data/Library
Library/Containers/*/Data/Movies
Library/Containers/*/Data/Music
Library/Containers/*/Data/Pictures
# Result:
~
❯ fd --follow |wc -l
2702849
~ took 9s
❯
@alessandroasm any progress on this? If you've encountered any difficulty or cannot spare the time, I am willing and able to help.
Hello cyqsimon, sadly I'm way too busy at this time, so I could not get any progress on this. Fell free to work on it if you want :)
On Fri, Aug 5, 2022 at 7:29 AM cyqsimon @.***> wrote:
@alessandroasm https://github.com/alessandroasm any progress on this? If you've encountered any difficulty or cannot spare the time, I am willing and able to help.
— Reply to this email directly, view it on GitHub https://github.com/sharkdp/fd/issues/851#issuecomment-1206343544, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWVKXYWDAAZDXH7J3XJUODVXT3LFANCNFSM5D3VSVZA . You are receiving this because you were mentioned.Message ID: @.***>
I would have liked this feature, too. If it's a performance or compatibility concern we could have an --exclude-abs option, that would then do a check if it's a file in the current search directory.
It's more of a "the library we use for this doesn't really support this". So we would have to find a way to work around that, or stop using that library. See https://github.com/BurntSushi/ripgrep/issues/2366
I finally have some time to come back to this issue.
From reading https://github.com/sharkdp/fd/blob/master/src/walk.rs, it seems like there is no good way to implement an "ignore by absolute path" mechanism within the confines of the ignore
crate. And I think BurntSushi does make some good points in https://github.com/BurntSushi/ripgrep/issues/2366#issuecomment-1336399045 on why it's a "wont-fix", in particular the non-trivial performance impact such a feature will incur.
So considering the performance impact, would it make some sense to split "absolute ignore" into its own flag, and implement it independently of what's offered by ignore
? Something like --exclude-absolute
maybe (and the corresponding global config file ~/.config/fd/ignore-absolute
)? And then in documentation we can inform the user very explicitly about the performance impact it entails.
As of the specific implementation, I imagine it won't be too difficult (if some performance penalty is acceptable). In fd::walk::spawn_senders
, simply canonicalise the current path (which is where most of the penalty is going to come from), and then use globset
to match. I'll make sure the canonicalization doesn't happen if the user hasn't specified anything via --ignore-absolute
so that there's no performance regression if the user doesn't use this new functionality. Further optimisations are going to be much more difficult I think, but at least the option to use it is there.
I'll quickly put together a prototype to test. Any ideas/suggestions are welcomed!
Another problem related to this is that --exclude
seems to use some kind of fuzzy matching.
Given a directory like this:
.
├── directory
│ ├── exclude-me
│ └── just-some-file
└── exclude-me
There's no way to exclude just the exclude-me
that is in the root directory:
❯ fd --exclude exclude-me
directory/
directory/just-some-file
EDIT: Never mind, my use case does not require any additional features. Pre-pending the pattern with a slash anchors it to the root directory:
❯ fd --exclude /exclude-me
directory/
directory/exclude-me
directory/just-some-file