rdfind icon indicating copy to clipboard operation
rdfind copied to clipboard

Is rdfind safe for sha1 (or other) checksum collisions?

Open slavanap opened this issue 2 years ago • 4 comments

I've attempted to read the code and haven't found a part that reads and compares 2 full files. So the determination of duplicates is seemed to be done only based on checksums (similarly as ZFS dedup=on), but not on contents (ZFS dedup=verify), is that correct?

slavanap avatar Nov 23 '22 17:11 slavanap

That is my interpretation as well. Admittedly, SHA-256 is fairly collision resistant.

fire-eggs avatar Nov 24 '22 02:11 fire-eggs

Then I'd suggest to point out it in documentation somewhere (like it did in ZFS documentation).

slavanap avatar Nov 24 '22 02:11 slavanap

May I suggest BLAKE3 as a (very fast and collision-resistant) alternative to SHA256?

Anyway, SHA1 is known-broken (See https://shattered.io/), and real-world collision examples exist (for example two different PDFs which hash the same: https://shattered.io/)

There's also https://github.com/corkami/collisions to consider, so I'ld say MD5 is definitely out.

GerHobbelt avatar May 22 '23 12:05 GerHobbelt

BTW: I have collected some of these (PDF files) collision file examples as part of a large, slowly growing, test corpus for another application and I'ld love it when default rdfind would be safe running over such a set.

:thinking: Ultimately, "safe" would then mean we'ld have to settle for an additional final verification round where the file content is compared byte-for-byte as you can never be absolutely sure with a (cryptographically secure) hash. Ah well, performance be damned.

Surely SHA256 has been tested and investigated quite thoroughly (BLAKE3 to a lesser extent), so the chances of that happening with SHA256 are astronomically thin, but then my paranoid brain thinks of Sir Pratchett (✝RIP): a million-to-one chance succeeds nine times out of ten so better safe then sorry for the paranoid = file content comparison. :smile:

GerHobbelt avatar May 22 '23 12:05 GerHobbelt