How to exclude certain directories?
Hello,
At work, the IS team creates a ~/.snapshot directory in each user's home. That .snapshot directory is a symlink to a remote disk and does not consume my disk space, but it's huge as it has hourly, daily, monthly, etc backups created automatically.
It would be nice to have a --exclude flag like ncdu.
Thanks for this project.
Sorry for the late reply, my notification settings were incorrect.
I agree, this is a perfect usecase for includes and/or excludes, maybe similar to rg -g ….
Right now, there is no such thing, not even to exclude hidden files and directories.
I really like that in dua, you don't have to wait for the full scan to complete, plus a nicer bar chart and percentage column. However the lacking of excluding directories (or hidden files) is a huge inconvenience, though most of the time dua is fast enough to ease the pain.
@kidonng What's the time, give or take, that you seem to be waiting for the traversal to complete and do you think that not traversing directories ignored by VCS would be what would help lowering the traversal times?
dua was born out of the need to see all the files in order to be able to delete waste, but maybe a case can be made for scenarios where easy-to-use directory exclusion makes a difference.
What's the time, give or take, that you seem to be waiting for the traversal to complete
I don't scan large folders very often so I can't give a precise estimation, but most of the time it takes less than 10 seconds.
When I posted that comment, I was comparing to other tools like ncdu or gdu. Their experiences are just no to my taste.
do you think that not traversing directories ignored by VCS would be what would help lowering the traversal times?
I think those are actually what most people care since you don't delete the actual project to free up space :)
duawas born out of the need to see all the files in order to be able to delete waste, but maybe a case can be made for scenarios where easy-to-use directory exclusion makes a difference.
For me there are mainly two cases:
- To ignore system files. Well I admit that I use dua to scan
/, but there are some duplications* which causes slow scan and wrong calculation. I will very appreciate it if I can exclude them. - Tell dua not to care about certain stuff instead of specifying all but those. For example something like
dua --exclude=node_modulesis much more convenient than e.g.dua $(ls | grep -v node_modules).
* For example /System/Volumes/Data on macOS
Thanks a lot for the elaborate reply, it definitely helps to understand the kind of problem you are seeing.
Interestingly I am on MacOS as well and noticed the 'double-accounting' for system directories due to the way these are mounted, causing them to be counted twice. Apparently I just turned a blind eye on that.
Being able to exclude things like /System/Volumes/Data definitely makes sense and the same mechanism can be used with relative paths too, like excluding node_modules.
Hello @Byron Any follow up on this ? I personally observe issues when using dua with microsoft onedrive on macos (current), it the scanning starts to be extremely slow. This is observed across scanning tools, but I can work around with "ncdu / --exclude ~/Library/CloudStorage"
In this case, what might help is -x to prevent it from crossing filesystems. If the directory at hand isn't mounted as separate filesystem, one can probably try to use the -i/--ignore-dirs flag. Please note that even though the docs say 'absolute directories to ignore', one will rather have to use the directory exactly how it comes up in the traversal, so dua -i target will not descend into target, but dua -i $PWD/target would.
I think for excluding hidden, or maybe even .gitignored directories, a new issue could be opened, as -i/--ignore-dirs covers this issue pretty well as it is effectively excluding certain directories.