seaborn
seaborn copied to clipboard
Histplot errors on multiple date columns
Python version 3.10.2 Seaborn version 0.11.2
shape = np.arange(100)*np.arange(100)[::-1]
df = pd.DataFrame({
'start': np.repeat(pd.date_range(start='2005-01-01', periods=100, freq='D'), shape),
'end': np.repeat(pd.date_range(start='2005-03-01', periods=100, freq='D'), shape)
})
sns.histplot(df, bins=10)
Expected result
Actual result
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [66], in <cell line: 1>()
----> 1 sns.histplot(df, bins=10)
File ~/mambaforge/envs/main/lib/python3.10/site-packages/seaborn/distributions.py:1462, in histplot(data, x, y, hue, weights, stat, bins, binwidth, binrange, discrete, cumulative, common_bins, common_norm, multiple, element, fill, shrink, kde, kde_kws, line_kws, thresh, pthresh, pmax, cbar, cbar_ax, cbar_kws, palette, hue_order, hue_norm, color, log_scale, legend, ax, **kwargs)
1451 estimate_kws = dict(
1452 stat=stat,
1453 bins=bins,
(...)
1457 cumulative=cumulative,
1458 )
1460 if p.univariate:
-> 1462 p.plot_univariate_histogram(
1463 multiple=multiple,
1464 element=element,
1465 fill=fill,
1466 shrink=shrink,
1467 common_norm=common_norm,
1468 common_bins=common_bins,
1469 kde=kde,
1470 kde_kws=kde_kws,
1471 color=color,
1472 legend=legend,
1473 estimate_kws=estimate_kws,
1474 line_kws=line_kws,
1475 **kwargs,
1476 )
1478 else:
1480 p.plot_bivariate_histogram(
1481 common_bins=common_bins,
1482 common_norm=common_norm,
(...)
1492 **kwargs,
1493 )
File ~/mambaforge/envs/main/lib/python3.10/site-packages/seaborn/distributions.py:428, in _DistributionPlotter.plot_univariate_histogram(self, multiple, element, fill, common_norm, common_bins, shrink, kde, kde_kws, color, legend, line_kws, estimate_kws, **plot_kws)
418 densities = self._compute_univariate_density(
419 self.data_variable,
420 common_norm,
(...)
424 warn_singular=False,
425 )
427 # First pass through the data to compute the histograms
--> 428 for sub_vars, sub_data in self.iter_data("hue", from_comp_data=True):
429
430 # Prepare the relevant data
431 key = tuple(sub_vars.items())
432 sub_data = sub_data.dropna()
File ~/mambaforge/envs/main/lib/python3.10/site-packages/seaborn/_core.py:997, in VectorPlotter.iter_data(self, grouping_vars, reverse, from_comp_data)
994 for var in grouping_vars:
995 grouping_keys.append(self.var_levels.get(var, []))
--> 997 iter_keys = itertools.product(*grouping_keys)
998 if reverse:
999 iter_keys = reversed(list(iter_keys))
TypeError: 'NoneType' object is not iterable
Thanks for the reproducible report. So actually all seaborn functions error out here, because the issue is in the common path for handling "wide" data. The rule is that "wide" dataframes are first reduced to numeric columns before melting but it does not handle column-less dataframes well (you can get the same result with, e.g. sns.histplot(tips[[]])
).
A nicer error message here would definitely be an improvement, especially if it could detect that the dataframe only becomes columnless after dropping non-numeric columns.
Then the harder question is "should this work". Wide dataframes have to reduced to a consistent variable type. It would be more complex, but helpful here, to have some hierarchy (reduce to numeric, unless there are no numeric columns, then reduce to datetimes). I'm not against that, but a challenge is that some of the other functions that are downstream from this common code path don't (currently) handle datatime variables. But they might have a type check that kicks in after the wide->long transformation, I'm not sure.
In any case, you should be able to get the plot you want with
sns.histplot(df.melt(), hue="variable", x="value", bins=10)
Thank you for the really nice explanation!
I agree that a better error message would be more user friendly.