seaborn icon indicating copy to clipboard operation
seaborn copied to clipboard

specifying bins argument of sns.histplot as bin edges of a datetime type

Open ybagdasa opened this issue 5 years ago • 4 comments

seaborn version : '0.11.0'

I can produce a histogram of dates using bins=number of bins with no problem: sns.histplot(data=df['visit_date'],bins=20 I cannot seem to specify the bin edges as a date type: sns.histplot(data=df['visit_date'],bins = np.arange("2000", "2020", dtype="datetime64[D]")

In [57]: sns.histplot(data=df['visit_date'],bins= np.arange("2000", "2020", dtype="datetime64[D]"))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-57-e11b4de76ee6> in <module>
----> 1 sns.histplot(data=df['visit_date'],bins= np.arange("2000", "2020", dtype="datetime64[D]"))

/data2/yelena/miniconda3/lib/python3.7/site-packages/seaborn/distributions.py in histplot(data, x, y, hue, weights, stat, bins, binwidth, binrange, discrete, cumulative, common_bins, common_norm, multiple, element, fill, shrink, kde, kde_kws, line_kws, thresh, pthresh, pmax, cbar, cbar_ax, cbar_kws, palette, hue_order, hue_norm, color, log_scale, legend, ax, **kwargs)
   1433             estimate_kws=estimate_kws,
   1434             line_kws=line_kws,
-> 1435             **kwargs,
   1436         )
   1437 

/data2/yelena/miniconda3/lib/python3.7/site-packages/seaborn/distributions.py in plot_univariate_histogram(self, multiple, element, fill, common_norm, common_bins, shrink, kde, kde_kws, color, legend, line_kws, estimate_kws, **plot_kws)
    434 
    435             # Do the histogram computation
--> 436             heights, edges = estimator(observations, weights=weights)
    437 
    438             # Rescale the smoothed curve to match the histogram

/data2/yelena/miniconda3/lib/python3.7/site-packages/seaborn/_statistics.py in __call__(self, x1, x2, weights)
    369         """Count the occurrances in each bin, maybe normalize."""
    370         if x2 is None:
--> 371             return self._eval_univariate(x1, weights)
    372         else:
    373             return self._eval_bivariate(x1, x2, weights)

/data2/yelena/miniconda3/lib/python3.7/site-packages/seaborn/_statistics.py in _eval_univariate(self, x, weights)
    350         density = self.stat == "density"
    351         hist, _ = np.histogram(
--> 352             x, bin_edges, weights=weights, density=density,
    353         )
    354 

<__array_function__ internals> in histogram(*args, **kwargs)

/data2/yelena/miniconda3/lib/python3.7/site-packages/numpy/lib/histograms.py in histogram(a, bins, range, normed, weights, density)
    876             for i in _range(0, len(a), BLOCK):
    877                 sa = np.sort(a[i:i+BLOCK])
--> 878                 cum_n += _search_sorted_inclusive(sa, bin_edges)
    879         else:
    880             zero = np.zeros(1, dtype=ntype)

/data2/yelena/miniconda3/lib/python3.7/site-packages/numpy/lib/histograms.py in _search_sorted_inclusive(a, v)
    459     """
    460     return np.concatenate((
--> 461         a.searchsorted(v[:-1], 'left'),
    462         a.searchsorted(v[-1:], 'right')
    463     ))

TypeError: invalid type promotion

ybagdasa avatar Dec 04 '20 00:12 ybagdasa

Please turn this into a reproducible sample (a simple fake dataset should suffice), thanks.

mwaskom avatar Dec 04 '20 01:12 mwaskom

Reproducible snippet:

dffake=pd.DataFrame(pd.date_range(start = '2012-01-01',end = '2019-01-01',freq='D'),columns=['date']).sample(10)
bins = pd.date_range(start = '2012-01-01',end = '2019-01-01',freq='7D')

The following all fail:

sns.histplot(data=dffake.date,bins=bins)
sns.histplot(data=dffake.date,bins=bins.astype('datetime64[ns]'))
sns.histplot(data=dffake.date,bins=np.array(bins.astype('datetime64[ns]')))
sns.histplot(data=dffake.date,bins=bins.to_pydatetime())

ybagdasa avatar Jan 27 '21 22:01 ybagdasa

Thanks!

This happens because at the time the histogram is computed, the datetime data are represented as numeric values, but bins gets passed straight through to numpy, and so you end up with numeric values and datetime bins and it does not make sense.

In principle, this is not difficult to solve, but doing so will be annoying in that bins is a very flexible argument, and most specifications (e.g. a number, a string) should not have any conversion happen.

BTW, I imagine that we'll run into the same problem with binwidth and binrange.

Fortunately it's easy to workaround in user-space by doing:

sns.histplot(data=dffake.date, bins=mpl.dates.date2num(bins))

mwaskom avatar Jan 27 '21 23:01 mwaskom

Ah, thank you for the explanation and the easy workaround. The solution has been difficult to track down. Thanks!

ybagdasa avatar Jan 27 '21 23:01 ybagdasa

While the workaround here isn't extremely obvious, I think it's pretty simple once you know what to do, and it looks like supporting bins-with-original-units would be rather complex. So I think I'm going to close with no action for now.

mwaskom avatar Sep 14 '22 00:09 mwaskom