dupeguru icon indicating copy to clipboard operation
dupeguru copied to clipboard

Option to set number of I/O threads

Open lordkitsuna opened this issue 3 years ago • 1 comments

Is your feature request related to a problem? Please describe. Not really a problem persay, its just that dupe is not going anywhere near as fast as it probably could be able to on my system. I have ample amounts of ram/cpu/I/O its leaving unused. As far as i can tell the checking of the files seems to be multi threaded but i am only seeing it spawn one I/O thread for reading (did not dig super deep on this so i could be wrong) i have a large bcachefs array for my storage. 8xHDD and 4X SSD as read/write cache. the HDD are only seeing around a 24% I/O total load (as reported by iotop) when using dupeguru on large directories. Something like hashdeep is able to chew through the exact same folders at significantly faster speeds. since i do not see virtually any load on any of my cores (3860X so 48 threads available but none seem loaded) and disk total I/O use is reported low and total system disk read is far lower than i expect i can only assume its limited to a single I/O thread thats somehow bottlenecked. i see a large kernel use of "__pv_queued_spin_lock_slowpath" when dupe is running its scan i am not sure if this is related or not.

Describe the solution you'd like an option to be able to enable and set more I/O threads to make better use of available disk speed

Describe alternatives you've considered I have considered just running more copies of dupeguru and trying to section out the work, i am not entirely against that either but ideally it wouldnt have to do that.

Additional context DupeGuru: 4.1.1 Kernel: 5.11.2-arch1-1

lordkitsuna avatar Mar 24 '21 20:03 lordkitsuna

I observed something similar although I clearly don't have as much knowledge. But still, I have a i9-12900KF (so 24 logical cores) and I see in the details section of the task manager that there is a dupeguru process for each core but in the thread column they only have a count of 1. And in my resources it says it uses around 25% to max 40%. Same for the HDD and the RAM that don't get used as much as they could probably. I also noticed there is only one process working on all the steps except when comparing the matches/chunks, it'd be nice if it could do it all along the way. Or rather give us the option to make it use more of our PC resources in general with various settings.

Proutbedaine avatar Feb 26 '22 04:02 Proutbedaine