rmlint
rmlint copied to clipboard
Deleting millions of files is very slow
I wanted to remove all duplicates from two nearly identical folders with millions of files. Using the removal script created by rmlint takes way more time than finding the duplicates. I believe the issue is that the rm process is started individually for each file. Is there some kind of workaround for this?
Duplicate directories are probably faster to delete than individual files, you can try rmlint -T dd,df and then running the script. Or you can just let it run overnight...
Not sure why rm is so slow, but others have reported this issue eg https://unix.stackexchange.com/questions/37329/efficiently-delete-large-directory-containing-thousands-of-files. There is some discussion of underlying issues at https://serverfault.com/questions/183821/rm-on-a-directory-with-millions-of-files/328305#328305