dua-cli icon indicating copy to clipboard operation
dua-cli copied to clipboard

How to exclude certain directories?

Open kaushalmodi opened this issue 6 years ago • 5 comments

Hello,

At work, the IS team creates a ~/.snapshot directory in each user's home. That .snapshot directory is a symlink to a remote disk and does not consume my disk space, but it's huge as it has hourly, daily, monthly, etc backups created automatically.

It would be nice to have a --exclude flag like ncdu.

Thanks for this project.

kaushalmodi avatar Oct 08 '19 12:10 kaushalmodi

Sorry for the late reply, my notification settings were incorrect.

I agree, this is a perfect usecase for includes and/or excludes, maybe similar to rg -g …. Right now, there is no such thing, not even to exclude hidden files and directories.

Byron avatar Feb 22 '20 01:02 Byron

I really like that in dua, you don't have to wait for the full scan to complete, plus a nicer bar chart and percentage column. However the lacking of excluding directories (or hidden files) is a huge inconvenience, though most of the time dua is fast enough to ease the pain.

kidonng avatar Jun 24 '21 07:06 kidonng

@kidonng What's the time, give or take, that you seem to be waiting for the traversal to complete and do you think that not traversing directories ignored by VCS would be what would help lowering the traversal times?

dua was born out of the need to see all the files in order to be able to delete waste, but maybe a case can be made for scenarios where easy-to-use directory exclusion makes a difference.

Byron avatar Jun 24 '21 12:06 Byron

What's the time, give or take, that you seem to be waiting for the traversal to complete

I don't scan large folders very often so I can't give a precise estimation, but most of the time it takes less than 10 seconds.

When I posted that comment, I was comparing to other tools like ncdu or gdu. Their experiences are just no to my taste.

do you think that not traversing directories ignored by VCS would be what would help lowering the traversal times?

I think those are actually what most people care since you don't delete the actual project to free up space :)

dua was born out of the need to see all the files in order to be able to delete waste, but maybe a case can be made for scenarios where easy-to-use directory exclusion makes a difference.

For me there are mainly two cases:

  • To ignore system files. Well I admit that I use dua to scan /, but there are some duplications* which causes slow scan and wrong calculation. I will very appreciate it if I can exclude them.
  • Tell dua not to care about certain stuff instead of specifying all but those. For example something like dua --exclude=node_modules is much more convenient than e.g. dua $(ls | grep -v node_modules).

* For example /System/Volumes/Data on macOS

kidonng avatar Jun 24 '21 12:06 kidonng

Thanks a lot for the elaborate reply, it definitely helps to understand the kind of problem you are seeing.

Interestingly I am on MacOS as well and noticed the 'double-accounting' for system directories due to the way these are mounted, causing them to be counted twice. Apparently I just turned a blind eye on that.

Being able to exclude things like /System/Volumes/Data definitely makes sense and the same mechanism can be used with relative paths too, like excluding node_modules.

Byron avatar Jun 26 '21 01:06 Byron

Hello @Byron Any follow up on this ? I personally observe issues when using dua with microsoft onedrive on macos (current), it the scanning starts to be extremely slow. This is observed across scanning tools, but I can work around with "ncdu / --exclude ~/Library/CloudStorage"

lbonvarl avatar Jun 05 '23 08:06 lbonvarl

In this case, what might help is -x to prevent it from crossing filesystems. If the directory at hand isn't mounted as separate filesystem, one can probably try to use the -i/--ignore-dirs flag. Please note that even though the docs say 'absolute directories to ignore', one will rather have to use the directory exactly how it comes up in the traversal, so dua -i target will not descend into target, but dua -i $PWD/target would.

I think for excluding hidden, or maybe even .gitignored directories, a new issue could be opened, as -i/--ignore-dirs covers this issue pretty well as it is effectively excluding certain directories.

Byron avatar Jun 05 '23 08:06 Byron