LANraragi icon indicating copy to clipboard operation
LANraragi copied to clipboard

Create a script-type plugin to detect duplicate archives using thumbnails

Open Difegue opened this issue 3 years ago • 0 comments

This one could be pretty fun, I think.

The script should go through the entire archive list and return a list of potential duplicate pairs at the end.
I see two potential ways to detect dupes:

  • Compare existing thumbnail hashes computed by LRR:
    The hashes already exist in the database since they're used for reverse image searches. This would be the easiest and fastest way to go. Here's some example code I got from who knows where:
    # Hamming distance. Take two hashes in. 
    # Returns the distance. If distance is below 5 the image is normally equal
    sub hammingdistance {
            my ($a, $b) = @_;

            my $distance = 0;
            for (my $i = 0; $i < 64; $i++) {
                    if ($a->{'hash'}->[$i] != $b->{'hash'}->[$i]) {
                            $distance++;
                    }
    
            }

            return $distance;
    }
  • Re-extract thumbnails and compare them in detail using a package like https://github.com/runarbu/PerlImageHash.
    This would be super expensive computationally speaking, but if the first way doesn't yield decent results I don't see any other solution.

Difegue avatar Sep 24 '20 16:09 Difegue