imagehash icon indicating copy to clipboard operation
imagehash copied to clipboard

Add rudimentary ICC color profile support to colorhash

Open david-russo opened this issue 4 years ago • 14 comments

Color profiles can significantly alter the appearance of an image's underlying color values on modern display equipment. This change modifies the colorhash algorithm to take that into account by normalizing (and converting) any embedded color profiles into the standard sRGB color space prior to hashing. This allows colorhash to detect color differences between images which may only differ by the inclusion, absence, or change of a color profile.

It also provides an option to ignore embedded color profiles and instead only analyze an image's raw color data (i.e. the default behaviour before this change), allowing images to be compared with or without their profiles.

david-russo avatar Jun 09 '20 15:06 david-russo

This seems like a good idea, but it is currently failing in the CI (see the logs there).

Do these color profiles also alter the luminescence? If yes, they could be relevant for the other hashes as well.

JohannesBuchner avatar Jun 09 '20 15:06 JohannesBuchner

Actually, if I read correctly, these ICC profiles are meant to adjust to the particular viewing device. Wouldn't it be more objective to ignore them? Probably I am misunderstanding this.

JohannesBuchner avatar Jun 09 '20 15:06 JohannesBuchner

They can have an effect on luminosity, yes, since they can essentially map any internal colour value to any other colour. Which means you could use a color profile to completely invert an image's colors, if you so desired.

There are a few kinds of profile, not just display profiles. Display profiles do describe what colors a display device is capable of displaying, but usually aren't embedded in files because they're specific to a given piece of hardware.

There are also input profiles, however, which are usually the ones you'd find embedded (such as by a scanner), and which describe precisely how the color values stored in an image file map to a standard device-independent color space, such as L*a*b*. This is the profile we care about, because it tells us how to correctly interpret the stored values.

When coupled with a display profile, such as the one that comes with your OS or monitor, both profiles are used to accurately map colour from the input profile to the display profile using a device-independent color space as a connection space: Input --> L*a*b* --> Display

It's even more complicated than all that gibberish above, but all we need to care about is that it works, and it's built-in to most image rendering software, so if we compare two images that are identical except one has an accurate input profile embedded, and the other doesn't, they're going to look different perceptually though they share the same underlying RGB values.

I hope that made some sense.

david-russo avatar Jun 09 '20 16:06 david-russo

The CI error appears to be due to a dependency missing from Pillow's conda package. Pillow needs to be built with LittleCMS2 to support its ImageCms module. I developed using pip, and PyPI's Pillow package seems to come with everything included. Comparing the two packages (conda, pip), I can see it missing from conda.

I'm not familiar with conda, but one of the possible solutions seems somewhat involved: https://stackoverflow.com/questions/52741538/anaconda3-python-3-6-pillow-cannot-import-imagecms

david-russo avatar Jun 09 '20 17:06 david-russo

The relevant conda package issue is here.

david-russo avatar Jun 11 '20 15:06 david-russo

Two suggestions:

  1. change the code so that when ImageCms is not loadable, it does not use this functionality, either with hasattr or a try-except(ImportError) construct.
  2. try to modify the travis install to install pillow with imagecms, perhaps with pip instead of conda. (adding conda uninstall pillow and pip install pillow).

JohannesBuchner avatar Jun 11 '20 18:06 JohannesBuchner

Coverage Status

Coverage decreased (-7.7%) to 86.047% when pulling 036d9fcc7f89b0f0b67e7bb84310a7c02807e6e3 on bl-dpt:colorhash-icc-support into 691f2108cafd729f59f75dff2b409534c3a7d9d9 on JohannesBuchner:master.

coveralls avatar Jun 12 '20 11:06 coveralls

I went with not allowing ImageCms to be optional due to the impact color profiles can have on an image's appearance, and I believe you were right earlier in thinking that they could be relevant to the other hash functions, so the code should probably be refactored to share this feature.

david-russo avatar Jun 12 '20 11:06 david-russo

Yeah, I agree. Any necessary profile conversion should happen on the Image object before it being passed to any of the hash functions. As for where that's done:

ICC profiles aren't particularly simple to handle, and there are a number of ways to screw up the conversion between profiles if one is not familiar with the concepts or specifics, such as the pros and cons between different rendering intents and color spaces.

For instance, I chose to normalize on the sRGB color space here because it's the only one supported by PIL well enough to allow conversion into HSV, as the current colorhash algorithm requires. It's also usually the default color space in which software renders images without profiles, so it should be generally analogous to what most users would see on their screens... but ideally both images with and without color profiles would first be converted to one of the larger, human vision-based color spaces, such as LAB, before hashing, to better account for colors that lie outside the sRGB space.

Since such decisions require a fair bit of knowledge, and are, currently at least, fairly tightly coupled to the hashing implementations, I think we should keep profile conversion internal to imagehash.

More generally, I think we should abstract away the need for users to import PIL at all, allowing them to pass in file-type objects or file paths instead. The use of PIL internally seems more like an implementation detail to me, which we needn't bother the user with.

david-russo avatar Jun 12 '20 14:06 david-russo

For this PR, do I also need to update data_files in setup.py with the new test images?

david-russo avatar Jun 12 '20 14:06 david-russo

I agree with what you say. I think there is a benefit though of passing PIL objects however, as one can apply additional image modifications beforehand. If I understand you correctly, we would need one function that prepares an image for 'L' mode use, and one that prepare it for 'HSV' mode use. These can be used within the hash functions, just like the current color space transformations.

JohannesBuchner avatar Jun 12 '20 14:06 JohannesBuchner

For this PR, do I also need to update data_files in setup.py with the new test images?

If you use them only for testing, no, if you want them to be installed in the site-package folder, yes. I think no.

JohannesBuchner avatar Jun 12 '20 14:06 JohannesBuchner

I agree with what you say. I think there is a benefit though of passing PIL objects however, as one can apply additional image modifications beforehand.

There's little harm in supporting all three, I suppose :)

If I understand you correctly, we would need one function that prepares an image for 'L' mode use, and one that prepare it for 'HSV' mode use. These can be used within the hash functions, just like the current color space transformations.

Since the current colorhash algorithm converts images into both "L" and "HSV", Pillow seems to support conversion from RGB to either of them just fine, so we should be able to use a single function for the current implementation. It would only get more complicated if we wanted to go from LAB to certain other modes.

david-russo avatar Jun 12 '20 14:06 david-russo

I would be happy to include this as a single preprocessing helper function. A section in the README should explain why and how to use it. This will make it easy for users to plug it in.

Then we can also go with dependencies being optional, as it is not a core functionality.

JohannesBuchner avatar Sep 02 '22 14:09 JohannesBuchner