rmlint icon indicating copy to clipboard operation
rmlint copied to clipboard

Deleting millions of files is very slow

Open ejezel opened this issue 4 years ago • 1 comments

I wanted to remove all duplicates from two nearly identical folders with millions of files. Using the removal script created by rmlint takes way more time than finding the duplicates. I believe the issue is that the rm process is started individually for each file. Is there some kind of workaround for this?

ejezel avatar Aug 17 '21 15:08 ejezel

Duplicate directories are probably faster to delete than individual files, you can try rmlint -T dd,df and then running the script. Or you can just let it run overnight...

Not sure why rm is so slow, but others have reported this issue eg https://unix.stackexchange.com/questions/37329/efficiently-delete-large-directory-containing-thousands-of-files. There is some discussion of underlying issues at https://serverfault.com/questions/183821/rm-on-a-directory-with-millions-of-files/328305#328305

SeeSpotRun avatar Aug 18 '21 22:08 SeeSpotRun