img_hash icon indicating copy to clipboard operation
img_hash copied to clipboard

hash_size doesn't match hash length when using DoubleGradient

Open hugopeixoto opened this issue 4 years ago • 1 comments

Hi there,

I am trying out the DoubleGradient hash algorithm. I expected the hash_size() passed to HasherConfig to be respected (assuming width and height being multiples of 2), but the resulting hashes have fewer bits than that. Here's a snippet of code and the resulting output:

let image = image::open("grayscale.png").unwrap();

for (w,h) in [(8,8), (16,16), (8,16), (16,8)] {
  let hasher = HasherConfig::new().hash_size(w,h).hash_alg(HashAlg::Gradient).to_hasher();
    println!("Gradient({}, {}): {:?} bits", w, h, 8 * hasher.hash_image(&image).as_bytes().len());

    let hasher = HasherConfig::new().hash_size(w,h).hash_alg(HashAlg::DoubleGradient).to_hasher();
    println!("DoubleGradient({}, {}): {:?} bits", w, h, 8 * hasher.hash_image(&image).as_bytes().len());
}

I also added a println inside hash_image to print bytes.len(), resize_width, and resize_height.

HashVals: 72 (9x8?)
Gradient(8, 8): 64 bits
HashVals: 25 (5x5?)
DoubleGradient(8, 8): 40 bits
HashVals: 272 (17x16?)
Gradient(16, 16): 256 bits
HashVals: 81 (9x9?)
DoubleGradient(16, 16): 144 bits
HashVals: 144 (9x16?)
Gradient(8, 16): 128 bits
HashVals: 45 (5x9?)
DoubleGradient(8, 16): 80 bits
HashVals: 136 (17x8?)
Gradient(16, 8): 128 bits
HashVals: 45 (9x5?)
DoubleGradient(16, 8): 80 bits

Both 8 and 16 are multiples of two, so I didn't expect any changes when using DoubleGradient. I think this is a bug, but I wasn't able to pinpoint the problem yet.

I tried both with img_hash 3.2.0 and with the latest commit on the main branch, which seems to be the same.

hugopeixoto avatar Jul 06 '21 19:07 hugopeixoto

Initially I didn't understand how DoubleGradient works. Now I see that it is a concatenation of the Horizontal Gradient and the Vertical Gradient hashes, with both calculated at a smaller size so that the two of them don't exceed the original dimensions.

When given a hash dimension of 8x8 (for example), img_hash resizes the image to 5x5 (8/2-1, 8/2-1). It then applies both gradients to the same resized image, both producing a 5x4 hash. These are then concatenated together and a 40 bit hash is returned.

With this way of constructing a double gradient, it's probably impossible to respect every possible original dimensions. Maybe the documentation could be updated to reflect that the specified dimensions are an upper bound, in this case? Or the hash could be zero padded at the end?

hugopeixoto avatar Jul 28 '21 12:07 hugopeixoto