fdupes
fdupes copied to clipboard
Combine redundant code bits that only call stat()
Several functions call stat()
and only return a single value from the struct stat
returned:
filesize()
getdevice()
getinode()
getmtime()
getctime()
There is also a stat()
call at line 327 and a function getfilestats()
which calls some of the stat()
functions mentioned.
The overhead from redundant function calls and system stat()
calls is heavy; for the 19 files and dirs in testdir
using fdupes -nrq testdir/
results in a total of 163 redundant stat()
calls according to an strace
log. On a different file tree with 1056 files and dirs, the excess stat()
count shoots up to 38559.
I propose combining all of these functions so that each file is stat()
ed only one time, with the relevant struct stat
items stored all at once.
That should be the reason why it's quite slow when used on files in network shares (tested for example against md5sum)