dupeguru
dupeguru copied to clipboard
Option to set number of I/O threads
Is your feature request related to a problem? Please describe. Not really a problem persay, its just that dupe is not going anywhere near as fast as it probably could be able to on my system. I have ample amounts of ram/cpu/I/O its leaving unused. As far as i can tell the checking of the files seems to be multi threaded but i am only seeing it spawn one I/O thread for reading (did not dig super deep on this so i could be wrong) i have a large bcachefs array for my storage. 8xHDD and 4X SSD as read/write cache. the HDD are only seeing around a 24% I/O total load (as reported by iotop) when using dupeguru on large directories. Something like hashdeep is able to chew through the exact same folders at significantly faster speeds. since i do not see virtually any load on any of my cores (3860X so 48 threads available but none seem loaded) and disk total I/O use is reported low and total system disk read is far lower than i expect i can only assume its limited to a single I/O thread thats somehow bottlenecked. i see a large kernel use of "__pv_queued_spin_lock_slowpath" when dupe is running its scan i am not sure if this is related or not.
Describe the solution you'd like an option to be able to enable and set more I/O threads to make better use of available disk speed
Describe alternatives you've considered I have considered just running more copies of dupeguru and trying to section out the work, i am not entirely against that either but ideally it wouldnt have to do that.
Additional context DupeGuru: 4.1.1 Kernel: 5.11.2-arch1-1
I observed something similar although I clearly don't have as much knowledge. But still, I have a i9-12900KF (so 24 logical cores) and I see in the details section of the task manager that there is a dupeguru process for each core but in the thread column they only have a count of 1. And in my resources it says it uses around 25% to max 40%. Same for the HDD and the RAM that don't get used as much as they could probably. I also noticed there is only one process working on all the steps except when comparing the matches/chunks, it'd be nice if it could do it all along the way. Or rather give us the option to make it use more of our PC resources in general with various settings.