imagehash icon indicating copy to clipboard operation
imagehash copied to clipboard

Comparing with all existing hashes in DB

Open rw4lll opened this issue 6 years ago • 3 comments

Hi! It is works fine if I need to compare exactly two images. But what if I have 3 millions hashes in DB and need to compare my current hash with each other in database (find all similar from images that already exists). Is there any solution except direct comparing? May be it's possible to use something similar to distance function right in database?

rw4lll avatar Jun 10 '19 07:06 rw4lll

You can use query like this:

select * from images where BIT_COUNT(X'3c3e0e1a3a1e1e1e' ^ images.hash) < 5

Alex-Gramm avatar Jun 12 '19 08:06 Alex-Gramm

Thanks!

rw4lll avatar Jun 14 '19 06:06 rw4lll

We are using postgres for our project, so it doesn't have bit_count method. I did custom function for that:

CREATE OR REPLACE FUNCTION hash_distance(hash1 varchar, hash2 varchar) RETURNS integer AS $$
        DECLARE
			bit_hash1 bit varying := hash1;
			bit_hash2 bit varying := hash2;
			length_hash1 int:= length(hash1);
			length_hash2 int:= length(hash2);
			max_length int:= GREATEST(length_hash1, length_hash2);
		BEGIN
				IF length_hash1 < max_length THEN
    				bit_hash1:=LPAD(bit_hash1::text, max_length, '0')::bit varying;
				END IF;
				
				IF length_hash2 < max_length THEN
    				bit_hash2:=LPAD(bit_hash2::text, max_length, '0')::bit varying;
				END IF;
				
				RETURN length(replace((bit_hash1 # bit_hash2)::text, '0', ''));
        END;
$$ LANGUAGE plpgsql;

select hash_distance('100000', '11010011'); It's based on logic of difference hash. Not sure how fast is it, but may be it helps for somebody in future.

rw4lll avatar Jun 15 '19 14:06 rw4lll