Consider caching results for faster performance on subsequent runs!

Open debnath-d opened this issue 10 months ago • 1 comments

I've been using https://github.com/dundee/gdu for the last few years and started using dua now and really love it. However, one thing that I noticed is that dua doesn't seem to have any kind of caching to speed up results on subsequent runs. gdu seems to use some kind of caching to make it a lot faster on subsequent runs.

Perhaps some form of hashing and hash tables can be used to determine which file-system trees have changed since last run and only calculate those trees again, and use cached results for unchanged trees.

Feb 25 '25 20:02 debnath-d

It would probably be something along the lines of a .git/index, it would have to keep stat information for each directory, and it could skip checking the direct children of a directory if its mtime didn't change. This is, of course, making the assumption that the filesystem will correctly update this value.

In terms of implementation, I'd think the user has to call it with an argument that points to a path at which existing caches can be found, and where it can store the cache for the current run. Implementing this correctly isn't trivial, and all it can try to do is avoid stat calls based mtimes of directories. Maybe I am missing something though.

Feb 26 '25 07:02 Byron