parallel-disk-usage icon indicating copy to clipboard operation
parallel-disk-usage copied to clipboard

HDD performance is poor

Open Davester47 opened this issue 5 months ago • 1 comments

pdu performs about 2x worse on my HDD than single-threaded du. I'm testing on an old home directory of mine on a mechanical hard drive, with about 712 gigabytes of data in around 150,000 files. The size difference reported by the two programs is due to hard links.

$ echo 1 | sudo tee /proc/sys/vm/drop_caches
$ time pdu
...
765.0G ┌─┴.
pdu  0.69s user 2.93s system 4% cpu 1:18.21 total

Compared to du:

$ echo 1 | sudo tee /proc/sys/vm/drop_caches
$ time du -sh .
712G	.
du -sh .  0.28s user 1.46s system 3% cpu 47.405 total

I'm not positive on the source of this difference, but I believe it's due to the directory traversal order used by the two programs. du uses a depth-first search whereas pdu seems to use breadth-first search through rayon, although I can't tell for sure. Interestingly, pdu is comparable to du when manually limited to a single thread:

$ echo 1 | sudo tee /proc/sys/vm/drop_caches
$ time RAYON_NUM_THREADS=1 pdu
...
765.0G ┌─┴.
RAYON_NUM_THREADS=1 pdu  0.46s user 1.87s system 5% cpu 46.078 total

Davester47 avatar Mar 14 '24 23:03 Davester47

pdu was never really designed to run on HDD (I forgot to mention it in README.md). But if there's an easy way to detect HDD and limit rayon thread to 1, I'll be happy to accept a pull request. Unless multi-threaded du is still faster on HDD for some reason.

KSXGitHub avatar Mar 15 '24 06:03 KSXGitHub