pHash correctness

Open leofidus opened this issue 2 years ago • 1 comments

I'm trying to use phash across different programming languages. For that purpose I consider the hashes produced by the python libraries https://github.com/thorn-oss/perception and https://github.com/JohannesBuchner/imagehash to be canonically correct.

The documentation of this project suggests that HasherConfig::new().hash_alg(HashAlg::Mean).preproc_dct().to_hasher() would produce a compatible hash, but in practice this is not the case. After some extensive experimentation, there are three changes I've identified to produce nearly the same results (Hamming Distance of ~4 on a 1024bit hash after these three changes):

[x] HashAlg::Median, lifted straight from the old img_hash_median crate
[x] Different bit order: reorder each byte in the resulting ImageHash so that the bit order 76543210 becomes 01234567.
[ ] Different conversion to grayscale: Pillow and Image use different conversion factors to go from RGB to grayscale. This has by far the lowest impact of the three (and from a quick search it seem the python versions also differ from the original C version of phash here)

I'll try to make the necessary PRs to make each of these options possible without changing the existing defaults.

Nov 12 '23 22:11 leofidus

Using median will also improve performance, because mean is susceptible to outliners. I have found that pHash by mean produces worse results (by my subjective evaluation) than aHash or dHash, which contradicts the findings from perception's benchmark where pHash is the winner. I was about to investigate the problem before I saw your post here, and it saved me a lot of time👍🏼.

I also ran a benchmark on a subset of "thorn-perceptual-benchmark-v0". Setting threshold = 3, the result is:

Median: precision = 100.0%, recall = 16.8%
Mean: precision = 7.1%, recall = 35.3%

Hashing by Mean seems to produce significantly more false positives (visually dissimilar images reported as similar).

Feb 16 '24 08:02 gyk