seaborn icon indicating copy to clipboard operation
seaborn copied to clipboard

Histplot errors on multiple date columns

Open user799595 opened this issue 2 years ago • 2 comments

Python version 3.10.2 Seaborn version 0.11.2

shape = np.arange(100)*np.arange(100)[::-1]
df = pd.DataFrame({
    'start': np.repeat(pd.date_range(start='2005-01-01', periods=100, freq='D'), shape),
    'end': np.repeat(pd.date_range(start='2005-03-01', periods=100, freq='D'), shape)
})
sns.histplot(df, bins=10)

Expected result image Actual result

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [66], in <cell line: 1>()
----> 1 sns.histplot(df, bins=10)

File ~/mambaforge/envs/main/lib/python3.10/site-packages/seaborn/distributions.py:1462, in histplot(data, x, y, hue, weights, stat, bins, binwidth, binrange, discrete, cumulative, common_bins, common_norm, multiple, element, fill, shrink, kde, kde_kws, line_kws, thresh, pthresh, pmax, cbar, cbar_ax, cbar_kws, palette, hue_order, hue_norm, color, log_scale, legend, ax, **kwargs)
   1451 estimate_kws = dict(
   1452     stat=stat,
   1453     bins=bins,
   (...)
   1457     cumulative=cumulative,
   1458 )
   1460 if p.univariate:
-> 1462     p.plot_univariate_histogram(
   1463         multiple=multiple,
   1464         element=element,
   1465         fill=fill,
   1466         shrink=shrink,
   1467         common_norm=common_norm,
   1468         common_bins=common_bins,
   1469         kde=kde,
   1470         kde_kws=kde_kws,
   1471         color=color,
   1472         legend=legend,
   1473         estimate_kws=estimate_kws,
   1474         line_kws=line_kws,
   1475         **kwargs,
   1476     )
   1478 else:
   1480     p.plot_bivariate_histogram(
   1481         common_bins=common_bins,
   1482         common_norm=common_norm,
   (...)
   1492         **kwargs,
   1493     )

File ~/mambaforge/envs/main/lib/python3.10/site-packages/seaborn/distributions.py:428, in _DistributionPlotter.plot_univariate_histogram(self, multiple, element, fill, common_norm, common_bins, shrink, kde, kde_kws, color, legend, line_kws, estimate_kws, **plot_kws)
    418     densities = self._compute_univariate_density(
    419         self.data_variable,
    420         common_norm,
   (...)
    424         warn_singular=False,
    425     )
    427 # First pass through the data to compute the histograms
--> 428 for sub_vars, sub_data in self.iter_data("hue", from_comp_data=True):
    429 
    430     # Prepare the relevant data
    431     key = tuple(sub_vars.items())
    432     sub_data = sub_data.dropna()

File ~/mambaforge/envs/main/lib/python3.10/site-packages/seaborn/_core.py:997, in VectorPlotter.iter_data(self, grouping_vars, reverse, from_comp_data)
    994 for var in grouping_vars:
    995     grouping_keys.append(self.var_levels.get(var, []))
--> 997 iter_keys = itertools.product(*grouping_keys)
    998 if reverse:
    999     iter_keys = reversed(list(iter_keys))

TypeError: 'NoneType' object is not iterable

user799595 avatar Apr 25 '22 17:04 user799595

Thanks for the reproducible report. So actually all seaborn functions error out here, because the issue is in the common path for handling "wide" data. The rule is that "wide" dataframes are first reduced to numeric columns before melting but it does not handle column-less dataframes well (you can get the same result with, e.g. sns.histplot(tips[[]])).

A nicer error message here would definitely be an improvement, especially if it could detect that the dataframe only becomes columnless after dropping non-numeric columns.

Then the harder question is "should this work". Wide dataframes have to reduced to a consistent variable type. It would be more complex, but helpful here, to have some hierarchy (reduce to numeric, unless there are no numeric columns, then reduce to datetimes). I'm not against that, but a challenge is that some of the other functions that are downstream from this common code path don't (currently) handle datatime variables. But they might have a type check that kicks in after the wide->long transformation, I'm not sure.

In any case, you should be able to get the plot you want with

sns.histplot(df.melt(), hue="variable", x="value", bins=10)

mwaskom avatar Apr 26 '22 23:04 mwaskom

Thank you for the really nice explanation!

I agree that a better error message would be more user friendly.

user799595 avatar Apr 29 '22 13:04 user799595