fastdupes
fastdupes copied to clipboard
Differing files with common prefix detected as duplicates
Tested on Ubuntu 16.04.
#!/bin/sh
git clone https://github.com/ssokolow/fastdupes
cd fastdupes
mkdir files
seq 100000 > files/file1; echo "1" >> files/file1
seq 100000 > files/file2; echo "2" >> files/file2
cmp files/file1 files/file2
python fastdupes.py files
Cloning into 'fastdupes'...
remote: Counting objects: 279, done.
remote: Total 279 (delta 0), reused 0 (delta 0), pack-reused 279
Receiving objects: 100% (279/279), 93.39 KiB | 0 bytes/s, done.
Resolving deltas: 100% (116/116), done.
Checking connectivity... done.
files/file1 files/file2 differ: byte 588896, line 100001
Found 2 files to be compared for duplication.
Found 1 sets of files with identical sizes. (2 files examined)
Found 1 sets of files with identical header hashes. (2 files examined)
Found 1 sets of files with identical hashes. (2 files examined)
/tmp/fastdupes/files/file2
/tmp/fastdupes/files/file1
Ugh. I hate these kinds of bugs that Indicate I somehow managed to fail to provide the kind of safety guarantee I thought.
The last few days have been busy, but I'll try to track this down as soon as possible.
I'm currently fighting off a summer cold, so it'll be a little while before I get to this. Sorry for the delay.
I understand. Anyway, I looked into it, and the normal hashing is behaving like the header hashing, see pull request.
Thanks.
I woke up today with no more traditional symptoms, but no mental capacity either, so I'll review it once that clears up.
OK, I'm back on my feet, but still catching up things that slipped. Hopefully, I'll have this fixed within the next few days.
Ok, I'm back. Sorry for the silence.
Please continue discussion under PR #32.