imagehash icon indicating copy to clipboard operation
imagehash copied to clipboard

Hash size doesn't match hash_size parameter for Daubechies wavelets hashing

Open jonemo opened this issue 2 years ago • 4 comments

I am surprised that the size of the hash computed is not equal to the hash_size parameter available for all hashing methods. Specifically, imagehash.whash(img, hash_size=16, mode="db4") yields a hash of size 22 x 22.

While the readme does not make any explicit promises about the hash size, the naming of parameters makes this outcome quite unexpected. Of course, me being surprised is not an issue in itself and unless this is a bug, it would be unreasonable to break backward compatibility with a change in API or behavior. However, maybe it's worth adding clarification that hash_size does not always match hash size in the documentation/readme?

The readme currently covers hash_size in this paragraph:

Each algorithm can also have its hash size adjusted (or in the case of colorhash, its binbits). Increasing the hash size allows an algorithm to store more detail in its hash, increasing its sensitivity to changes in detail.

Sample code:

    img = Image.open(path)
    hash = imagehash.average_hash(img, hash_size=16)
    print(f"average_hash: {len(hash.hash)} x {len(hash.hash[0])}")
    hash = imagehash.dhash(img, hash_size=16)
    print(f"dhash: {len(hash.hash)} x {len(hash.hash[0])}")
    hash = imagehash.phash(img, hash_size=16)
    print(f"phash: {len(hash.hash)} x {len(hash.hash[0])}")
    hash = imagehash.whash(img, hash_size=16, mode="haar")
    print(f"whash haar: {len(hash.hash)} x {len(hash.hash[0])}")
    hash = imagehash.whash(img, hash_size=16, mode="db4")
    print(f"whash db4: {len(hash.hash)} x {len(hash.hash[0])}")

Output:

average_hash: 16 x 16
dhash: 16 x 16
phash: 16 x 16
whash haar: 16 x 16
whash db4: 22 x 22

Example image:

tl-20210924-185242

jonemo avatar Sep 27 '21 21:09 jonemo

Huh. Do you know why db4 does that?

JohannesBuchner avatar Sep 04 '22 19:09 JohannesBuchner

Sorry, I am the wrong person to ask this question. I used imagehash precisely because I have no clue about any of these algorithms. (And that was a year ago, now I know even less.)

jonemo avatar Sep 07 '22 03:09 jonemo

The two wavelet shapes: http://wavelets.pybytes.com/wavelet/db4/ http://wavelets.pybytes.com/wavelet/haar/ (nothing obvious there)

The whash function calls this: https://pywavelets.readthedocs.io/en/latest/ref/2d-dwt-and-idwt.html#d-multilevel-decomposition-using-wavedec2 with default mode ('symmetric')

Maybe have a look at the input and output of this call https://github.com/JohannesBuchner/imagehash/blob/master/imagehash/init.py#L385

JohannesBuchner avatar Sep 07 '22 07:09 JohannesBuchner

In any case, given how differently the various methods work, no, hash_size does not necessarily have to have a consistent meaning across all methods.

JohannesBuchner avatar Sep 07 '22 07:09 JohannesBuchner