seaborn
seaborn copied to clipboard
histplot 2D ignores hue_norm argument
Description
If I understand correctly, when plotting a 2D histogram with histplot, the hue_norm argument should allow setting the values corresponding to data values corresponding to the extremes of the colormap, or a normalization object from matplotlib.colors such as LogNorm.
However, it seems that the hue_norm argument is being ignored. Even providing objects of some unreasonable type the same plot is produced, with no errors or complains.
Versions
seaborn 0.11.1 matplotlib 3.4.1
Example
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.colors import LogNorm
x = np.random.randn(1000)
y = np.random.randn(1000)
fig, axs = plt.subplots(ncols=4, figsize=(16,3))
sns.histplot(x=x, y=y, ax=axs[0], cbar=True)
sns.histplot(x=x, y=y, ax=axs[1], cbar=True, hue_norm=(6, 10))
sns.histplot(x=x, y=y, ax=axs[2], cbar=True, hue_norm=LogNorm())
sns.histplot(x=x, y=y, ax=axs[3], cbar=True, hue_norm='I would expect strings to crash')

If I understand correctly, when plotting a 2D histogram with histplot, the hue_norm argument should allow setting the values corresponding to data values corresponding to the extremes of the colormap, or a normalization object from matplotlib.colors such as LogNorm.
I'm curious how you came to this understanding. hue_norm modifies the hue mapping, but you're not assigning hue here. Perhaps there should be an exception if hue_norm is defined while hue isn't, though.
Also, hue_norm does do input checking if hue is defined:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-94-5c03245d2711> in <module>
11 #sns.histplot(x=x, y=y, ax=axs[1], cbar=True, hue_norm=(6, 10))
12 sns.histplot(x=x, y=y, ax=axs[2], cbar=True, norm=PowerNorm(2))
---> 13 sns.histplot(x=x, y=y, hue=[0, 1] * 500, hue_norm='I would expect strings to crash')
~/code/seaborn/seaborn/distributions.py in histplot(data, x, y, hue, weights, stat, bins, binwidth, binrange, discrete, cumulative, common_bins, common_norm, multiple, element, fill, shrink, kde, kde_kws, line_kws, thresh, pthresh, pmax, cbar, cbar_ax, cbar_kws, palette, hue_order, hue_norm, color, log_scale, legend, ax, **kwargs)
1387 )
1388
-> 1389 p.map_hue(palette=palette, order=hue_order, norm=hue_norm)
1390
1391 if ax is None:
~/code/seaborn/seaborn/_core.py in map(cls, plotter, *args, **kwargs)
53 # This method is assigned the __init__ docstring
54 method_name = "_{}_map".format(cls.__name__[:-7].lower())
---> 55 setattr(plotter, method_name, cls(plotter, *args, **kwargs))
56 return plotter
57
~/code/seaborn/seaborn/_core.py in __init__(self, plotter, palette, order, norm)
109
110 data = pd.to_numeric(data)
--> 111 levels, lookup_table, norm, cmap = self.numeric_mapping(
112 data, palette, norm,
113 )
~/code/seaborn/seaborn/_core.py in numeric_mapping(self, data, palette, norm)
247 elif not isinstance(norm, mpl.colors.Normalize):
248 err = "``hue_norm`` must be None, tuple, or Normalize object."
--> 249 raise ValueError(err)
250
251 if not norm.scaled():
ValueError: ``hue_norm`` must be None, tuple, or Normalize object.
but the method that does hue mapping (and this check) only gets called if hue is assigned; doing input handling on the mapping-related parameters without the mapping being defined is harder because it has to be done in the body of every function.
Uops! Ok, I see I really misunderstood the hue_norm parameter. Then I would not worry about checking the validity of a parameter that will be ignored anyway, not worth the hassle. I was just expecting some change on the color scale, and the incorrect types were just "proving" that the parameter was being ignored... Thanks for the answer and sorry for the noise! 🙏
I'm curious how you came to this understanding.
Haha, I would say it was a combination of:
- I could not find a way to tune the color scale of 2D histograms.
thresh,pthreshandpmaxcould be used to play with the limits, but I did not manage set logarithmic color scale (also triedsns.histplot(x=x, y=y, cbar=True, norm=LogNorm()), which throws an error) - I find a bit hard to come up with a use case of
hue_normwithhistplot. The meaning of the parameter was much clearer in the example inscatterplot, where it is more natural to think of numeric types for hue dimensions.
And then, I guess that mixing both things with some wishful reading got me there... 😅
Not sure if there's any drawback in adding examples in the API documentation (e.g. build times?), but personally I find them really useful. So I'd advocate for adding an example at least on how to achieve point 1 (happy to prepare the PR). Regarding point 2, it depends on whether there are natural enough use cases or not.
For controlling the histogram color scale, you can use kwargs that are passed through to pcolormesh, so vmin, vmax, and norm. I believe these should override the reparameterization that histplot does.
I don't think hue_norm is especially useful in the histogram context (though perhaps moreso for univariate histograms), but it is part of the core hue mapping API and currently all relevant mapping parameters are included in each function that allows that mapping.
In terms of tradeoffs for API examples, it does slow the doc build down, but the bigger question is whether adding additional examples of a specific application increases or decreases the overall signal-to-noise ratio of the documentation. Beyond the core functionality, each new example that's relevant to you is an example that's irrelevant to someone else, but makes it harder for them to find out how to do what they want. It can be hard to know exactly where to draw that line...
I don't think hue_norm is especially useful in the histogram context (though perhaps moreso for univariate histograms), but it is part of the core hue mapping API and currently all relevant mapping parameters are included in each function that allows that mapping.
I suspected that. Seems totally reasonable to me! 👌
[...] Beyond the core functionality, each new example that's relevant to you is an example that's irrelevant to someone else, but makes it harder for them to find out how to do what they want. It can be hard to know exactly where to draw that line...
Fair enough! I guess it would have saved me (and you, sorry for the noise again) some time. But I totally see your point, so I leave it at your judgement. I would not mind having an example of a 2D histogram with log color scale, but I agree that it might be left for an answer at stackoverflow.
For controlling the histogram color scale, you can use kwargs that are passed through to pcolormesh, so vmin, vmax, and norm. I believe these should override the reparameterization that histplot does.
You're right, sns.histplot(x=x, y=y, cbar=True, norm=LogNorm(), vmin=None, vmax=None) works.
The thing is that at first I had only tried sns.histplot(x=x, y=y, cbar=True, norm=LogNorm()) -using the same data from the first message- but it fails because it conflicts with vmin and vmax set by histplot, not sure why I didn't try to set the limits to None.
It also throws a MatplotlibDeprecationWarning about getting both norm and vmax/vmin parameters simultaneously, and threatens to become an error at v3.5.
In any case, unless you think something needs to be done about that norm and vmin/vmax conflict, I would say you can close the issue. Thanks a lot for the patience and clarity! 🙏 (and for the awesome library, of course! 😉 )
I'm open to making the process of passing a color norm for the histogram gradient easier. Also I guess we should deal with the upcoming matplotlib deprecation, although it feels like that will be slightly irksome (sometimes it's helpful for seaborn when matplotlib ignores duplicate kwarg specification, although it's more confusing for direct matplotlib users).
(Sorry for the silence, busy days)
The option I can think of would be to avoid passing vmin and vmax to pcolormesh when norm is present/not None. Then, it would be up to the user to manage same/different color norm. The ugly part is that the user providing both vmin/vmax and norm at the same would imply either raising an error (from seaborn or matplotlib) or ignoring either the parameters. And also, I guess that would require adding norm as an explicit parameter for seaborn, as it would modify/interact with other explicit parameters
Trying to set vmin/vmax to the Normalize subclass would work for LogNorm but is way less robust. Not all the subclasses in matplotlib.colors have these attributes.
Tot convinced by any option... 🤔 Do you have any approach in mind?
I also ran into this issue (several times tbh). As a user it is really not obvious why sns.histplot(x=x, y=y, cbar=True, norm=LogNorm()) does not work but you need to set vmin and vmax to None, and sns.histplot(x=x, y=y, cbar=True, norm=LogNorm(), vmin=None, vmax=None) works.
From my point of view you could either add a note or add an exampleto the documentation. This way you would circumvent having to choose between the two options @mgab proposed. Fixing documentation is easier than fixing the code ;)