fdupes icon indicating copy to clipboard operation
fdupes copied to clipboard

Combine redundant code bits that only call stat()

Open jbruchon opened this issue 8 years ago • 1 comments

Several functions call stat() and only return a single value from the struct stat returned:

filesize()
getdevice()
getinode()
getmtime()
getctime()

There is also a stat() call at line 327 and a function getfilestats() which calls some of the stat() functions mentioned.

The overhead from redundant function calls and system stat() calls is heavy; for the 19 files and dirs in testdir using fdupes -nrq testdir/ results in a total of 163 redundant stat() calls according to an strace log. On a different file tree with 1056 files and dirs, the excess stat() count shoots up to 38559.

I propose combining all of these functions so that each file is stat()ed only one time, with the relevant struct stat items stored all at once.

jbruchon avatar Nov 14 '16 03:11 jbruchon

That should be the reason why it's quite slow when used on files in network shares (tested for example against md5sum)

golimarrrr avatar Dec 18 '17 15:12 golimarrrr