fdupes icon indicating copy to clipboard operation
fdupes copied to clipboard

Detect duplicate subtrees

Open Harvie opened this issue 8 years ago • 3 comments

I know this will probably be kinda hard to implement... But it would be cool if we can detect duplicate trees/directories. Currently if you have two completely identical directories with completely identical files you will never know and you will have to delete them file by file. Also you will be left with empty directory.

But what if we make directory hash based on hashes of all files in the directory? (ignoring filenames). Then we can even make this recursive to subdirectories to detect whole duplicate trees. But then we might need to remove the individual files from dupes output so it will get shorter (which is point of this).

I am trying to handle MASSIVE ammount of duplicities. (output file of fdupes has 18 MB). I guess this can be greatly reduced if we manage to find duplicate subtrees, because lots of this are traditional "backups of backups of backups of backups". If i can handle whole subtrees as one item rather than file-by-file it would greatly reduce the effort needed to dedup such storage.

Harvie avatar Feb 09 '17 02:02 Harvie

Surely the hashes of files has to be sorted. And i don't want to delete whole directory if there's one different file. Because i don't want to loose that file. In such case you will have to work it out file by file.

using file finding and text processing tools and temp files

WOW! with such approach the fdupes would have never existed, because it can be completely replaced with "file finding and text processing tools and temp files". But it's just easier to have state of art tool that doesn't require any ad hoc programming to get stuff done.

Harvie avatar Feb 09 '17 03:02 Harvie

Such a subtree-detection would be a tremendeous time-saving and also a security-feature for efficiently handling backups of backups(-of-b....), a frequent use case.

hellyberry avatar Apr 29 '17 11:04 hellyberry

Would be something similar to rmlint -T dd like said here?

pabloab avatar Sep 10 '17 19:09 pabloab