find-duplicates
find-duplicates copied to clipboard
Test hashes of parts of large files before hashing entire file
For large files (say over 1MB in size), we can test a few parts of the file to quickly detect non-duplicates without having to read the entire file. A possible set of parts to hash should be the first 4KB, the middle 4KB-aligned 4KB, and the last 4KB-aligned 4KB. It might be sufficient to hash the middle page.