hvplot icon indicating copy to clipboard operation
hvplot copied to clipboard

Large timeseries

Open hoxbro opened this issue 7 months ago • 8 comments

This adds a notebook that explains the different ways of working with large time-series datasets with holoviz

hoxbro avatar Nov 22 '23 12:11 hoxbro

Reviewing the published page https://holoviz-dev.github.io/hvplot/user_guide/visualizing_large_timeseries.html rather than the source code:

visualizing large timeseries

Title of notebook needs to be capitalized to match others and have a reasonable title. hvPlot is inherently a plotting library, so "visualizing" seems redundant. Just "Large_Timeseries", maybe?

  1. Datashader rasterizing

When pages are built for our website, it uses the default Datashader image size. The default image size is intentionally set to a low value to avoid generating a large image that is then thrown away in interactive usage, updating to the actual display resolution via a RangeXY callback. Here, because the callback is never invoked, the image is rendered at a very low resolution, which looks bad on the website. I think the images can be improved by including a cell like this early in the notebook:

from holoviews.operation.resample import ResampleOperation2D
ResampleOperation2D.width=1200
ResampleOperation2D.height=500

[The example above needs rasterize, plus instant inspection. Also needs to illustrate what happens when very large numbers of traces overlap.]

Presumably this note should be omitted, and an issue opened instead.

  1. Minimap

The example shows no data by default; presumably we should put some sort of initial range in there that causes data to display when exported to HTML?

The instructions also don't seem to match the plot; there's no grey box visible, and once you pan to find one, it's not a small rectangle but something larger than the plot, which doesn't seem right. Plus panning and zooming in the bottom plot make it very easy to get lost; I would think that the minimap should not have any y axis panning and no zooming, just x panning. It should be hard to shoot yourself in the foot or get lost.

While this guide is useful in its own, I believe it needs to be significantly re-worked before being integrated in the docs. It feels to me it's been written as a standalone guide. Actually, I believe it could be pretty easily turned into a nice blog post!

Yes, this was a standalone guide, and we decided that hvplot was where it should end up. I agree it will make a nice blog post when we are done, but it should also have a permanent home in our docs so that people can figure out the best way to deal with their large timeseries data.

The guide should be more integrated into the docs, probably moving some of its content to the Time Series Data guide.

I could be convinced otherwise, but my first guess would be that the Time Series guide should lose its LTTB section and instead it should have a section at the end suggesting that people look at this separate guide if they have large timeseries or want to look at many of them together. It's a lot of content already and I don't think it's relevant to people with small timeseries.

The guide focuses too much on Bokeh, hvPlot supports Matplotlib and Plotly too.

That's a general tension in the hvPlot docs that I believe remains unresolved -- how do we show how the backends differ, as well as how the various data sources differ?  I don't think this one is particularly different in that respect, but if it is, it can have an explicit statement that these examples focus on Bokeh but in some cases similar functionality is available for the other backends.

The old way: Bokeh's custom Canvas rendering and WebGL: new baseline for timeseries plotting: I'm not sure we should mention how things used to be? Ideally, we'd have a guide specific to Bokeh like Plotting with Bokeh in HoloViews' docs, that mentions WebGL.

Maybe don't say it's the old way, then, but just mention that it's an option and that it's not recommended any more.

Datashader rasterizing: I just realized explaining Datashader is difficult, I don't know how many notebook users will understand this sentence: Datashader works in a different way, rendering the data into a frame buffer on the server, and then sending that buffer to the web browser rather than the individual data points. 

I probably wrote that; any suggestions on how to make it clearer?

We're also defining Datashader in multiple places in hvPlot's docs. Ideally again, we could have a "Large data" guide that would be the only place where we would define and explain Datashader, and link to it from other places.

Sounds good. I hear you volunteering to write that! :-)

I also find the guide isn't extended enough on anti-aliasing, I bet most users aren't familiar with it and need more explanation.

I think we can put in a link that explains it.

Minimap: What should be the main reference place to introduce the RangeToolLink in hvPlot's docs? 

Good question! It seems to me that the minimap is primarily useful for large timeseries, and so to me it belongs here, in the large timeseries notebook.

@Hoxbro, can you link to the issues that detail the remaining warts and areas for improvement in this notebook? I think you mentioned that they existed but I don't see how to get to them from here.

jbednar avatar Dec 01 '23 02:12 jbednar

I'm starting to address the points raised above. I've collected the tasks in a board

droumis avatar Dec 12 '23 00:12 droumis

I've added it, along with an explainer admonition, but it's a bit awkward to add the following to the notebook. Ideally, we could either run this in the CI workflow somehow or use a hidden cell (not sure how).

from holoviews.operation.resample import ResampleOperation2D
ResampleOperation2D.width=1200
ResampleOperation2D.height=500

droumis avatar Dec 13 '23 20:12 droumis

@droumis I made a couple of small changes to attempt to hide the cell we were talking about the other day (the one setting the resampling dimensions). This is usually supported by MyST-NB by adding the hide-cell tag to the cell and I was happy to see that nbsite doesn't affect that. I changed the config on the clean_notebook hook to ignore the tags key in the cell metadata. The cell is correctly hidden on the site :)

image

I added a small comment to make it clear there's something special with this cell, it's not so obvious otherwise when you work from JupyterLab/Notebook.


I'm planning to release hvPlot 0.9.1 today, how do you feel about this PR? It seems to me it's in a much better state and it could go as in. I'm even fine having some sections marked as WIP. Up to you to tell me what you think, there's no emergency to merge this either, at least from my side.

maximlt avatar Dec 21 '23 10:12 maximlt

re: hidden cell, that's great to see, @maximlt! I think that will help us in several other places across holoviz docs.

re: merging now, let's wait for the Bokeh 3.4 and the next HoloViews release, as this notebook requires Bokeh #13603, and benefits from HoloViews #6030. I'm also actively working on the things marked WIP. I'd also really like to see auto-ranging for multiple lines fixed before this is released, which @jlstevens will hopefully have time to address early Jan.

droumis avatar Dec 21 '23 13:12 droumis

I processed some real spike waveform data to create a new datashader section on plotting many lines per multiple categories. As far as I could figure out, until we resolve the relevant data format issues in HoloViews, the simplest way for hvPlot is to add NaN separators to a dataframe, so I've done that step prior to upload the data and just explained it in the notebook. It's up on the dev website.

image

Unless there are any further comments, I think we are just waiting on Bokeh 3.4 and the next HoloViews release to merge this PR. If autoranging, ds inspections, or this nan issue gets resolved before then - great, but those can also be follow-ups.

droumis avatar Dec 28 '23 19:12 droumis

@droumis , that new plot looks great! So nice to see that after years of just imagining it. :-)

Are the new issues you found when doing that now part of https://github.com/orgs/holoviz/projects/14/views/2 ? If not please add them there. We've come up with a nice, comprehensive set of issues to address, now we just need to address them!

jbednar avatar Jan 07 '24 00:01 jbednar

  • [ ] Need to document hvPlot can take advantage of tsdownsample when it's installed (https://github.com/holoviz/holoviews/pull/6059)

maximlt avatar Jan 23 '24 16:01 maximlt

superseded by https://github.com/holoviz/hvplot/pull/1302

droumis avatar Apr 08 '24 18:04 droumis