datashader icon indicating copy to clipboard operation
datashader copied to clipboard

Parallel Coordinate Diagram

Open js3711 opened this issue 8 years ago • 4 comments

I have a use case for Datashader to produce a Parallel Coordinate Diagram (https://en.wikipedia.org/wiki/Parallel_coordinates) for ~100 million independent N - Dimensional vectors. My current dataset stores each of the N-dimensions as a separate column in a DataFrame. I was hoping the interface would allow for easy reordering of the columns.

I have attempted to implement a Parallel Coordinate glyph very similar to the line glyph, but my implementation is not running as efficiently as I'd like.

I have a couple of questions:

  • Am I missing an easy way to generate a Parallel Coordinate Diagram with the existing Datashader interface? (hoping to have efficiency similar to N * line efficiency)
  • Is there any recommendations to help improve efficiency? See the attached file Line2 glyph for my first attempt. glyphs.txt

Feel free to follow up with questions.

Thank you for you help, Justin

js3711 avatar Feb 07 '17 02:02 js3711

I haven't tried it, but I would have thought you could use the existing Line support to do a parallel-coordinates plot, by transposing and padding your data into a tidy format like the one in the timeseries notebook, with each multiline separated by np.NaN. That should be fast to render, but it's true that such a representation would not facilitate reordering the columns. So it may be appropriate to write special-purpose code for parallel coordinates organized as you describe, though I would have hoped it would require far less duplication of code than in the attached file.

jbednar avatar Feb 07 '17 13:02 jbednar

One of the requirements would definitely be to support column reordering without performing data manipulation. I agree about the code duplication, I wanted to get something working efficiently before I refactored.

js3711 avatar Feb 10 '17 13:02 js3711

(...) transposing and padding your data into a tidy format like the one in the timeseries notebook, with each multiline separated by np.NaN. (...)

@jbednar, is there a way to make line splitting (by adding separating np.nan's) that will work also with grouped plots? Like such one:

start = plot_data.time.min() end = plot_data.time.max()

ds = hv.Dataset(plot_data, ['time', 'variable'], ['value']) grouped = ds.select(variable=[..., ..., ...], time=(start, end)).to(hv.Curve, 'time', 'value') datashade(grouped)

tmikolajczyk avatar May 04 '20 14:05 tmikolajczyk

When that comment was written in 2017, the only way to have separate lines was to put NaNs between them in one continuous list, but Datashader now supports many different ways to provide multiple lines, and so you can usually choose one that is efficient both for rendering and for whatever else you want to do with the lines, such as having one column per line. See https://datashader.org/user_guide/Timeseries.html#Plotting-large-numbers-of-time-series-together for an example, and look at the docstring for datashader.Canvas.line for the details.

HoloViews will generally choose an appropriate representation already, and if what you are asking here is why you get an error in a specific case, please file that as an issue on HoloViews with the error message or incorrect output so that we can see what needs to be done.

Meanwhile, this issue is about good support for large parallel coordinate plots, which if nothing else needs a good example somewhere, so I'll leave it open for that purpose.

jbednar avatar May 04 '20 16:05 jbednar