seaborn icon indicating copy to clipboard operation
seaborn copied to clipboard

histplot fails with log_scale

Open taranu opened this issue 3 years ago • 6 comments

I tried to heed the deprecation warning and replace a call with distplot where I had set hist_kws["log"] = True, but every attempt I've made to use log_scale=True in histplot with v0.11.1 fails. A minimal example is:


ValueError                                Traceback (most recent call last)
<ipython-input-4-6dc875eca8b6> in <module>
----> 1 seaborn.histplot(list(range(20)), log_scale=True)

~/.local/lib/python3.7/site-packages/seaborn/distributions.py in histplot(data, x, y, hue, weights, stat, bins, binwidth, binrange, discrete, cumulative, common_bins, common_norm, multiple, element, fill, shrink, kde, kde_kws, line_kws, thresh, pthresh, pmax, cbar, cbar_ax, cbar_kws, palette, hue_order, hue_norm, color, log_scale, legend, ax, **kwargs)
   1434             estimate_kws=estimate_kws,
   1435             line_kws=line_kws,
-> 1436             **kwargs,
   1437         )
   1438 

~/.local/lib/python3.7/site-packages/seaborn/distributions.py in plot_univariate_histogram(self, multiple, element, fill, common_norm, common_bins, shrink, kde, kde_kws, color, legend, line_kws, estimate_kws, **plot_kws)
    435 
    436             # Do the histogram computation
--> 437             heights, edges = estimator(observations, weights=weights)
    438 
    439             # Rescale the smoothed curve to match the histogram

~/.local/lib/python3.7/site-packages/seaborn/_statistics.py in __call__(self, x1, x2, weights)
    369         """Count the occurrances in each bin, maybe normalize."""
    370         if x2 is None:
--> 371             return self._eval_univariate(x1, weights)
    372         else:
    373             return self._eval_bivariate(x1, x2, weights)

~/.local/lib/python3.7/site-packages/seaborn/_statistics.py in _eval_univariate(self, x, weights)
    346         bin_edges = self.bin_edges
    347         if bin_edges is None:
--> 348             bin_edges = self.define_bin_edges(x, weights=weights, cache=False)
    349 
    350         density = self.stat == "density"

~/.local/lib/python3.7/site-packages/seaborn/_statistics.py in define_bin_edges(self, x1, x2, weights, cache)
    264 
    265             bin_edges = self._define_bin_edges(
--> 266                 x1, weights, self.bins, self.binwidth, self.binrange, self.discrete,
    267             )
    268 

~/.local/lib/python3.7/site-packages/seaborn/_statistics.py in _define_bin_edges(self, x, weights, bins, binwidth, binrange, discrete)
    255         else:
    256             bin_edges = np.histogram_bin_edges(
--> 257                 x, bins, binrange, weights,
    258             )
    259         return bin_edges

<__array_function__ internals> in histogram_bin_edges(*args, **kwargs)

~/.local/lib/python3.7/site-packages/numpy/lib/histograms.py in histogram_bin_edges(a, bins, range, weights)
    666     """
    667     a, weights = _ravel_and_check_weights(a, weights)
--> 668     bin_edges, _ = _get_bin_edges(a, bins, range, weights)
    669     return bin_edges
    670 

~/.local/lib/python3.7/site-packages/numpy/lib/histograms.py in _get_bin_edges(a, bins, range, weights)
    394                             "bins is not supported for weighted data")
    395 
--> 396         first_edge, last_edge = _get_outer_edges(a, range)
    397 
    398         # truncate the range if needed

~/.local/lib/python3.7/site-packages/numpy/lib/histograms.py in _get_outer_edges(a, range)
    322         if not (np.isfinite(first_edge) and np.isfinite(last_edge)):
    323             raise ValueError(
--> 324                 "autodetected range of [{}, {}] is not finite".format(first_edge, last_edge))
    325 
    326     # expand empty range to avoid divide by zero

ValueError: autodetected range of [-inf, 1.2787536009528289] is not finite

Worse still, the error persists if repeating the call with log_scale=False.

taranu avatar Jan 28 '21 22:01 taranu

log_scale in histplot does something different from log in plt.hist. In the former, it's the data axis that's log scaled (and the hist binning is done in log space). In the latter, it's just setting a log scale on the count axis of the plot. Your example is failing because you have a 0 in the input data, which is producing -infinity after taking the log.

I'm not really sure why log is a parameter of plt.hist. It might be a relic of the days when bar plots couldn't naturally be shown with a log y axis (since they technically start at 0) and needed special casing, but that's no longer an issue in matplotlib and so you can get the same result by modifying the axes:

ax = histplot(x)
ax.set_yscale("log")

mwaskom avatar Jan 29 '21 01:01 mwaskom

I see; thanks for clarifying. There does still seem to be an issue with the failure repeating in subsequent calls, though, as in:

import seaborn
x = list(range(20))
seaborn.histplot(x, log_scale=False)
try:
    seaborn.histplot(x, log_scale=True)
except ValueError as e:
    print(f'failed #2: {e}')
try:
    seaborn.histplot(x, log_scale=False)
except ValueError as e:
    print(f'failed #3: {e}')    

Both the second and third calls fail with the same error. I though perhaps it was somehow modifying the data in place, but that doesn't seem to be the case, so I don't see an obvious cause or user error here.

As an aside, I only recently learned about the useful shortcut of histplot(x).set(yscale="log") if you don't need the axis handle returned.

taranu avatar Jan 30 '21 00:01 taranu

What's happening here is that the "business end" of histplot only really cares about whether the matplotlib axis object has a log scale or not. The log_scale parameter's only proximal effect is to modify the axis, and then the transformations do or do not happen downstream from that. Also log_scale=False means "don't set a log scale" not "set a scale that isn't log". The documentation for that parameter is not very clear about this.

So then you're presumably running those commands one after the other in a single Jupyter cell or at the command line (or wherever) such that they're all working with the same matplotlib Axes. The second command modifies its scale and the third one doesn't modify its scale, but still tries to log scale the data.

So that explains the slightly weird behavior but the more important question is what should be done when the positional variable for a log scaled axis has non-positive values. Probably this should either raise at the point where it can be detected with a clear error (rather than letting some downstream computation barf in confusing ways) or those observations should be removed from the dataset and a warning should be issued. I guess I lean towards the latter.

mwaskom avatar Jan 30 '21 14:01 mwaskom

Actually I forgot ... log_scale also takes an (x, y) tuple so you don't need to modify the axes you could do

histplot(x, log_scale=(False, True))

to recreate the log=True parameter in plt.hist.

mwaskom avatar Feb 06 '21 15:02 mwaskom

Oh duh, I thought at one point that the error was probably because I was trying to re-plot on already broken axes but somehow forgot about it. Thanks for clarifying.

Personally, I would vote for raising as soon as possible if there are non-positive values with log set since the warning might get buried in a long log or suppressed altogether, but filtering with a warning seems equally reasonable.

taranu avatar Feb 09 '21 20:02 taranu

A bit late to the party but there's a difference between log-transforming the variable and and log-scaling the axis. E.g. nothing wrong with having a log scale plot for negative values (y ticks at -10,-100,-1000). Similarly we could have a histogram based on negative, log-spaced bins (x=-1000,-100,-10 etc ). So this is perfectly sensible scaling, but clearly log transforming these negative values - which is what seaborn is doing - will be undefined.

In this sense the current keyword should be 'log_transform' and not 'log_scale'.

drorcohengithub avatar Aug 13 '21 03:08 drorcohengithub

After a few PRs touching the scale machinery but mostly https://github.com/mwaskom/seaborn/pull/3488, the given example here no longer raises.

mwaskom avatar Sep 23 '23 23:09 mwaskom