Comparing with all existing hashes in DB
Hi! It is works fine if I need to compare exactly two images. But what if I have 3 millions hashes in DB and need to compare my current hash with each other in database (find all similar from images that already exists). Is there any solution except direct comparing? May be it's possible to use something similar to distance function right in database?
You can use query like this:
select * from images where BIT_COUNT(X'3c3e0e1a3a1e1e1e' ^ images.hash) < 5
Thanks!
We are using postgres for our project, so it doesn't have bit_count method. I did custom function for that:
CREATE OR REPLACE FUNCTION hash_distance(hash1 varchar, hash2 varchar) RETURNS integer AS $$
DECLARE
bit_hash1 bit varying := hash1;
bit_hash2 bit varying := hash2;
length_hash1 int:= length(hash1);
length_hash2 int:= length(hash2);
max_length int:= GREATEST(length_hash1, length_hash2);
BEGIN
IF length_hash1 < max_length THEN
bit_hash1:=LPAD(bit_hash1::text, max_length, '0')::bit varying;
END IF;
IF length_hash2 < max_length THEN
bit_hash2:=LPAD(bit_hash2::text, max_length, '0')::bit varying;
END IF;
RETURN length(replace((bit_hash1 # bit_hash2)::text, '0', ''));
END;
$$ LANGUAGE plpgsql;
select hash_distance('100000', '11010011'); It's based on logic of difference hash. Not sure how fast is it, but may be it helps for somebody in future.