pHash correctness
I'm trying to use phash across different programming languages. For that purpose I consider the hashes produced by the python libraries https://github.com/thorn-oss/perception and https://github.com/JohannesBuchner/imagehash to be canonically correct.
The documentation of this project suggests that HasherConfig::new().hash_alg(HashAlg::Mean).preproc_dct().to_hasher() would produce a compatible hash, but in practice this is not the case. After some extensive experimentation, there are three changes I've identified to produce nearly the same results (Hamming Distance of ~4 on a 1024bit hash after these three changes):
- [x] HashAlg::Median, lifted straight from the old img_hash_median crate
- [x] Different bit order: reorder each byte in the resulting ImageHash so that the bit order 76543210 becomes 01234567.
- [ ] Different conversion to grayscale: Pillow and Image use different conversion factors to go from RGB to grayscale. This has by far the lowest impact of the three (and from a quick search it seem the python versions also differ from the original C version of phash here)
I'll try to make the necessary PRs to make each of these options possible without changing the existing defaults.
Using median will also improve performance, because mean is susceptible to outliners. I have found that pHash by mean produces worse results (by my subjective evaluation) than aHash or dHash, which contradicts the findings from perception's benchmark where pHash is the winner. I was about to investigate the problem before I saw your post here, and it saved me a lot of time👍🏼.
I also ran a benchmark on a subset of "thorn-perceptual-benchmark-v0". Setting threshold = 3, the result is:
- Median: precision = 100.0%, recall = 16.8%
- Mean: precision = 7.1%, recall = 35.3%
Hashing by Mean seems to produce significantly more false positives (visually dissimilar images reported as similar).