LANraragi
LANraragi copied to clipboard
Create a script-type plugin to detect duplicate archives using thumbnails
This one could be pretty fun, I think.
The script should go through the entire archive list and return a list of potential duplicate pairs at the end.
I see two potential ways to detect dupes:
- Compare existing thumbnail hashes computed by LRR:
The hashes already exist in the database since they're used for reverse image searches. This would be the easiest and fastest way to go. Here's some example code I got from who knows where:
# Hamming distance. Take two hashes in.
# Returns the distance. If distance is below 5 the image is normally equal
sub hammingdistance {
my ($a, $b) = @_;
my $distance = 0;
for (my $i = 0; $i < 64; $i++) {
if ($a->{'hash'}->[$i] != $b->{'hash'}->[$i]) {
$distance++;
}
}
return $distance;
}
- Re-extract thumbnails and compare them in detail using a package like https://github.com/runarbu/PerlImageHash.
This would be super expensive computationally speaking, but if the first way doesn't yield decent results I don't see any other solution.