pytest-mpl default hash savefig format

This PR addresses issue #152, by ensuring that a default format='png' is injected to the savefig kwargs when generating a hash.

Closes #152

May 21 '22 22:05 bjlittle

Okay, wowzers... can anyone explain the test failure behaviour of py310 with mpl35 across all platforms ? :confused:

May 21 '22 22:05 bjlittle

Very strange. It seems that passing format='png' to savefig changes the hashes. I wonder if a default has changed in 3.5?

May 21 '22 22:05 ConorMacBride

The hashes are different on main also (testing py310-test-mpl35 on macOS locally) so not related to this PR. Matplotlib 3.5.2 was released 19 days ago so my guess is that something has changed there.

May 21 '22 22:05 ConorMacBride

Yes, the tests pass on MPL 3.5.1. Can you pin this to 3.5.1 rather than 3.5.*? https://github.com/matplotlib/pytest-mpl/blob/e3876180847c2d04aeda1982372074998a7f8716/tox.ini#L29 Hopefully this is something that the perceptual hashes can solve, as we really should be testing against the latest bug fixes from matplotlib!

May 21 '22 23:05 ConorMacBride

Awesome detective work 🕵️

May 21 '22 23:05 bjlittle

@ConorMacBride If the sha hashes are sensitive to some change from mpl>3.5.1, what does that mean for default hash testing? Switch to phash?

What are the implications to the community if we bank this new default format behaviour in this PR? Isn't this a breaking change in behaviour?

May 21 '22 23:05 bjlittle

@ConorMacBride If the sha hashes are sensitive to some change from mpl>3.5.1, what does that mean for default hash testing? Switch to phash?

sha hashes are definitely too sensitive. It would be great if such a constrained testing environment wasn't necessary as that would make the package much easier to use. I would need to experiment more with phash to get an understanding about the sensitivity. Although it would be up to the maintainers (astrofrog, Cadair and dopplershift) to decide.

I presume the default hash size, high frequency factor and hamming tolerance can be tuned to a value that minimises, what an average user would consider, a test passing incorrectly? (I would expect most of the phash examples here should result in a failed hash comparison.)

I suspect phash may not be appropriate for tests that need to be particularly colour sensitive? All images are converted to 8-bit grayscale so, for example, this image would be similar to an image where all the values have been multiplied by -1 (based on what the bwr colormap looks like in grayscale).

What are the implications to the community if we bank this new default format behaviour in this PR? Isn't this a breaking change in behaviour?

From reading the savefig documentation, the only way I can see this being a breaking change is if someone has a different value set in savefig.format rcparams or they are using a different backend (by passing a backend kwarg to the pytest-mpl decorator, which is an undocumented feature).

https://github.com/matplotlib/pytest-mpl/blob/e3876180847c2d04aeda1982372074998a7f8716/pytest_mpl/plugin.py#L601

I've added some code below. In the default case, both hashes are the same so no breaking change. However, if a custom backend has been specified, passing format="png" seems to force Agg which would be a breaking change. However, svg changes hashes every time (due to random ids referencing objects), and ps has similar behaviour. So it might be unlikely that someone is changing the backend while using hash comparison.

So, if the user asks for a different backend, setting format="png" would have the effect of ignoring that when doing hash comparison. For phash, the image data needs to be in a format that pillow/PIL can read although I think it should work for more than just png?

To not interfere with the requested backend, I think we should only set format="png" if a backend kwarg isn't set. Or maybe we should not be setting format="png" as that's the default unless the user has requested something different?

Click for code

import io
import hashlib
import matplotlib
import matplotlib.pyplot as plt


def _hash_file(in_stream):
    """
    Hashes an already opened file.
    """
    in_stream.seek(0)
    buf = in_stream.read()
    hasher = hashlib.sha256()
    hasher.update(buf)
    return hasher.hexdigest()


def hash(**kwargs):
    fig = plt.figure()
    ax = fig.add_subplot(1, 1, 1)
    ax.plot([1, 3, 2, 4])
    imgdata = io.BytesIO()
    fig.savefig(imgdata, **kwargs)
    plt.close()
    return _hash_file(imgdata)


# Default (pytest-mpl sets agg)
matplotlib.use("agg")
print(hash())
# b7d7754ed51a78949e119e3900b72b4123fcc3674fd6a5750b9a93f7091293ab
print(hash(format="png"))
# b7d7754ed51a78949e119e3900b72b4123fcc3674fd6a5750b9a93f7091293ab

# User passes backend="svg" to pytest-mpl decorator
metadata = {"Creator": None, "Date": None, "Format": None, "Type": None}
matplotlib.use("svg")
print(hash(metadata=metadata))
# 81910e47aa39455d96c41893f73793c65b5405af14af7919dbe8a7172e004fcf
print(hash(metadata=metadata))
# 0bf13ec69c253c6a8d3b6b004aa4855e343e5f40a79106e87fee72c208f7de89
print(hash(format="png"))
# b7d7754ed51a78949e119e3900b72b4123fcc3674fd6a5750b9a93f7091293ab

May 22 '22 21:05 ConorMacBride

sha hashes are definitely too sensitive. It would be great if such a constrained testing environment wasn't necessary as that would make the package much easier to use. I would need to experiment more with phash to get an understanding about the sensitivity. Although it would be up to the maintainers (astrofrog, Cadair and dopplershift) to decide.

Agreed.

May 23 '22 09:05 bjlittle

I presume the default hash size, high frequency factor and hamming tolerance can be tuned to a value that minimises, what an average user would consider, a test passing incorrectly? (I would expect most of the phash examples here should result in a failed hash comparison.)

That's pretty much my experience so far. We've opted for a hash_size=16 (which gives a 256-bit hash) highfreq_factor=4 (imagehash.phash default) and a hamming distance tolerance of <=2 for a pass with phash, and that suits our needs.

May 23 '22 09:05 bjlittle

I suspect phash may not be appropriate for tests that need to be particularly colour sensitive? All images are converted to 8-bit grayscale so, for example, this image would be similar to an image where all the values have been multiplied by -1 (based on what the bwr colormap looks like in grayscale).

I've re-ran and updated a notebook that I wrote to initially investigate and play with imagehash, which is available here.

To create a conda environment to run it, simply conda create -n imagehash -c conda-forge jupyter iris imagehash, and download the notebook gist.

Interestingly, changing the colormap on a plot is detected by phash. But yeah, play around and get a feel for what it's offering :+1:

May 23 '22 09:05 bjlittle

Overall, I wait for you guys to come to a consensus and advise on what to do next here with this PR.

May 23 '22 09:05 bjlittle

I think matplotlib puts metadata into the png including the version iirc. I think we can turn that off.

See the metadata keyword argument here: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.savefig.html and more info on png here: https://matplotlib.org/stable/api/backend_agg_api.html#matplotlib.backends.backend_agg.FigureCanvasAgg.print_png

May 23 '22 10:05 Cadair

Given how effective removing the metadata version number was in #163 I think we should definitely remove it by default and make it a breaking change (for v1.0.0 😉).

May 27 '22 21:05 ConorMacBride

pytest-mpl pytest-mpl copied to clipboard

default hash savefig format

pytest-mpl
pytest-mpl copied to clipboard