hvplot icon indicating copy to clipboard operation
hvplot copied to clipboard

Add datetime axis capability to categorical plots

Open bonnland opened this issue 4 years ago • 5 comments

Is your feature request related to a problem? Please describe.

This issue was first raised here: https://discourse.holoviz.org/t/workaround-for-date-based-histogram-tick-labels/788

I have time series data, and would like to bin the data according to different time resolutions. When I try linking axes and controlling axis tick labels, I find it difficult to do so:

import numpy as np
import pandas as pd

import hvplot.pandas
import holoviews as hv
from bokeh.models.formatters import DatetimeTickFormatter

# Create data frame
days = pd.date_range(start=pd.to_datetime('2001-01-01'), end=pd.to_datetime('2001-12-31'))
ncols = 5
columns = np.arange(ncols) + 1
data = np.random.rand(365, ncols)
df = pd.DataFrame(data, index=days, columns=columns)
df.index.name = 'time'

# Determine number of days where row mean is above 0.5
mean_above_midpoint = (df.agg('mean', axis=1) > 0.5).astype(int)
df['days_above_midpoint'] = mean_above_midpoint
df_weekly = df.resample('W').mean()
df_weekly['days_above_midpoint'] = df['days_above_midpoint'].resample('W').sum()
df_weekly.head()

formatter = DatetimeTickFormatter(months='%b')
columnLabels = [str(x) for x in columns]

lines = df_weekly.hvplot.line(x='time', y=columnLabels, title='Measurements', xformatter=formatter)
hist  = df_weekly.hvplot.bar(x='time', y='days_above_midpoint', title='Days Above Midpoint', xformatter=formatter)

(lines + hist).cols(1)

Screen Shot 2020-06-09 at 10 55 30 AM

Describe the solution you'd like

Most categorical plots (histogram, box+whisker, violin) have utility when the bins are discrete time intervals, rather than categories. It would be nice if these plots allowed a datetime axis with all of the related linking abilities to other time series plots.

I am aware now of a workaround using Holoviews, but it's not something I could have figured out on my own.

Thanks for your consideration!

bonnland avatar Jun 11 '20 16:06 bonnland

I think this is worth discussing and may be related to the issue in https://github.com/holoviz/hvplot/issues/490

jlstevens avatar Aug 03 '20 16:08 jlstevens

I'm trying to follow the work-around as listed in the discourse, but my use case is with stacked data. Histograms in holoviews don't seem to support stacked data as far as I can tell, so the current workaround doesn't work unfortunately.

pollackscience avatar Feb 23 '21 16:02 pollackscience

I mostly plot financial timeseries data. Users are very used to having a line/ curve plot on top and some bar plot below. The x-axis in both places is datetime.

The ability not to not easily support this use case with linked axis is a MAJOR draw back and means I most often have to resort to other plotting libraries.

Below is something created using HighCharts. The users are just so used to seeing their data like this across a lot of external systems that trying to argue they could also see the same in other ways is just not reasonable.

image

image

MarcSkovMadsen avatar Aug 21 '22 16:08 MarcSkovMadsen

This issue actually lists a few different issues.

First, yes bar plots do not support numerical data. @philippjfr wrote on Discourse:

So this is an unfortunate quirk of the Bars element in HoloViews which is always categorical

Do you still stand by that? It seems to me that Bars should always be categorical.

The workaround offered on Discourse was the following:

lines = df_weekly.hvplot.line(x='time', y=columnLabels, title='Measurements', xformatter=formatter)
hist  = hv.Histogram(df_weekly.hvplot.bar(x='time', y='days_above_midpoint')).opts(title='Days Above Midpoint', xformatter=formatter, width=700)

(lines + hist).cols(1)

And indeed using the Histogram element is a valid workaround, and it feels it's correct way to represent data that has is aggregated over time intervals. This is actually the classic way to plot a hyetograph, that represents the evolution of the rainfall intensity over time: image

@philippjfr what do you think about adding an x parameter to df.hvplot.hist that would build the histogram not using the histogram operation of HoloViews but by directly calling hv.Histogram((df['x'], df.['y']))? Something I'm unsure about is that using this approach the bins are centered around the x values, while in some cases (or all?) you'd want the x values to match with the right bin edges, as in the plot above (the physical measurements were made at t=10,20,etc.).

@pollackscience is asking for stacked histograms, that's a feature request for HoloViews.

Lastly, @MarcSkovMadsen on the first plot you shared, what are the intervals between the bins where the values are 0? Do they have any meaning?

maximlt avatar Oct 14 '22 09:10 maximlt

Pinging @jlstevens to get your opinion on that:

Something I'm unsure about is that using this approach the bins are centered around the x values, while in some cases (or all?) you'd want the x values to match with the right bin edges, as in the plot above (the physical measurements were made at t=10,20,etc.).

maximlt avatar Oct 14 '22 17:10 maximlt