matplotlib icon indicating copy to clipboard operation
matplotlib copied to clipboard

[Bug]: Hist2d is less sensitive to narrow bins in new version, resulting in different plots

Open mmajewsk opened this issue 2 years ago • 13 comments

Bug summary

I was redoing old analysis, which earlier has been showing narrow bands in histogram:

(matplotlib 3.3.4) MicrosoftTeams-image (2)

and when i tried it again, with exactly the same code, the same data the bands are gone: (matplotlib 3.5.1) MicrosoftTeams-image (1)

Code for reproduction

axe.hist2d(px, py,range=[[0,2047],[0,30]], bins=[2048,30],cmin=1)

Actual outcome

(matplotlib 3.5.1) MicrosoftTeams-image (1)

Expected outcome

(matplotlib 3.3.4) MicrosoftTeams-image (2)

Additional information

No response

Operating system

No response

Matplotlib Version

3.5.1

Matplotlib Backend

No response

Python version

No response

Jupyter version

No response

Installation

No response

mmajewsk avatar Jul 28 '22 17:07 mmajewsk

Please make a reproducible minimal example. We can't test this as-is.

jklymak avatar Jul 28 '22 17:07 jklymak

If you zoom in do the bands come back? Can you also check than np.histogram2d behaves the same with both numpy versions?

My knee-jerk diagnosis is that either something changed with how we are doing anti-aliasing on pcolormesh or have shifted something in the rendering by a few pixels so that it is hitting the AA code differently. If I am reading that right, there is something like 2-3x as many bins in the x-direction as there pixels available.

tacaswell avatar Jul 28 '22 18:07 tacaswell

@jklymak

minimal code sample:

import matplotlib.pyplot as plt
import numpy as np
size = 2048

px = []
py = []
std=3

cases = 100
for i in range(size):
    for j in range(cases):
        px.append(i)
        m = 10*i/size
        if i%32 == 0:
            py.append(np.random.normal(1.5*m,std*2))
        else:
            py.append(np.random.normal(m,std))

fig, axe = plt.subplots(1,1,figsize=(15,7))
axe.hist2d(px, py,range=[[0,2047],[0,30]], bins=[2048,30],cmin=1)

matplotlib 3.3.4

bugmplold

matplotlib 3.5.1

bugmplnew

mmajewsk avatar Jul 28 '22 18:07 mmajewsk

@tacaswell

If you zoom in do the bands come back? Can you also check than np.histogram2d behaves the same with both numpy versions?

No they do not, at least not with png and in notebook. That would actually be hard for me to do right now, but i think this can be excluded because:

My knee-jerk diagnosis is that either something changed with how we are doing anti-aliasing on pcolormesh or have shifted something in the rendering by a few pixels so that it is hitting the AA code differently. If I am reading that right, there is something like 2-3x as many bins in the x-direction as there pixels available.

It seems it must be the rendereing since when written to pdf both plots are roughly the same.

mmajewsk avatar Jul 28 '22 19:07 mmajewsk

I think we just need a FAQ https://matplotlib.org/stable/users/faq/index.html for these sorts of issues. Basically if you have too many pixels we (or any rendering engine) have to decide which ones to show. We don't anti-alias pcolormesh cells like we do imshow.

jklymak avatar Jul 28 '22 19:07 jklymak

@jklymak I literally wouldn't have published a few papers If I would see the version from new matplotlib. I don't know how this can be solved within the code, but i would strongly suggest going back to the previous rendering, since:

  1. The new way hurts reproducibility of plots with matplotlib
  2. Very crutial things on the plot can be missed in the way that the new plots are made.
  3. AFAIK the anti-aliasing is for the plots to look nice, and if i would care for them looking nice i would opt for it, but good representation of the underlying data should take precedence.

It doesn't help matplotlib to be serious plotting library missing information from a plot because of a rendering issue (which wasn't there before).

mmajewsk avatar Jul 28 '22 19:07 mmajewsk

The contention is you had a rendering issue in both versions, just that it was different.

jklymak avatar Jul 28 '22 19:07 jklymak

For instance in the above, at least on my screen, the 3.5.1 version has more detail than the 3.3.4 version. Maybe you got them backwards, but regardless, you ideally will have two pixels for each bin, which for your plot means at least 275 dpi (4096 dots /15 inches). And practically you need more because your axes don't reach both sides of the figure, so round up to 300 dpi. Anything less and you will get aliasing.

If you use a pdf viewer it will anti-alias for you and do a better job of dealing with the singletons by averaging visual pixels. We do a version of that for imshow (https://matplotlib.org/stable/gallery/images_contours_and_fields/image_antialiasing.html) but we do not do that for pcolormesh rasters.

jklymak avatar Jul 28 '22 21:07 jklymak

Did we change something about drawing edges by default on pcolormesh?

tacaswell avatar Jul 28 '22 22:07 tacaswell

Maybe? All I remember us working of for pcolormesh was the shading, which shouldn't affect this...

jklymak avatar Jul 28 '22 23:07 jklymak

Snapping was changed, so maybe something related to that? https://github.com/matplotlib/matplotlib/pull/16090

greglucas avatar Jul 28 '22 23:07 greglucas

@mmajewsk try setting snap=False to get the old (still aliased) behaviour.

jklymak avatar Jul 29 '22 00:07 jklymak

I can confirm that snap=False gives the 3.3.4 behaviour.

The fundamental issue still remains that pcolormesh (or hist2d) with more bins than pixels is going to be aliased. I strongly recommend that folks who want to do raster outputs save in a quite high dpi and then reduce visually using an external package. Theoretically Matplotlib could do that, but currently we do not. For it to work properly you need the whole image to be composed in high definition and no matter what high definition you use, it will be too low for someone's plot.

jklymak avatar Jul 29 '22 20:07 jklymak

I'm going to close this, because I don't think there is much we can do about this issue...

jklymak avatar Oct 26 '22 22:10 jklymak