bokeh icon indicating copy to clipboard operation
bokeh copied to clipboard

[FEATURE] MultiScatter

Open gmerritt123 opened this issue 3 years ago • 16 comments
trafficstars

Problem description

For me, this applies primarily to linking of spatial and transient data. At each location in plan view, lots of datapoints exist over time. Sometimes I want to plot these transient datapoints with lines: MultiLine is beautifully equipped to do this:

data = {'Location': ['loc1','loc2'],'Northing':[0,4],'Easting':[3,4],'Time':[[0,1,2,3],[2,3,4]],'Value':[[4,3,6],[1,3,2]]}
src = ColumnDataSource(data)
plan_view = figure()
plan_view.scatter(x='Easting',y='Northing',source=src)
time_plot = figure()
time_plot.multiline(xs='Time',ys='Value',source=src)

... And so on. CustomJS then does stuff with user defined selections, and hovertool allows the user to link the space and time components visually. It's awesome.

But sometimes, the transient data is discontinuous enough to warrant plotting as points, not connected lines. For these I've got an entirely different setup using Scatter and a lot of annoying, sub-optimal CustomJS doing basically an inner join (through alasql) operation to link two separate CDSs (one for the spatial, one for the transient). It works but complicates setup and is noticeably laggier for bigger datasets.

Feature description

What I dream of, is a dead-on clone of MultiLine, but instead of rendering lines, it just renders markers. That's basically it. I browsed source code etc. looking to see if I could do this myself and it seems WAY out of my league. I would be extremely, extremely happy to be guided on this, and of course if someone is willing to simply do it for me... well I'd also be extremely happy about that too.

Potential alternatives

As mentioned in problem description, I have my own workaround but it's very suboptimal and over-complicated.

Additional information

No response

gmerritt123 avatar Sep 10 '22 18:09 gmerritt123

I'm a bit skeptical about this off the bat, since multi_line is an oddball among the collection of glyphs, and I am not sure I want to have more oddball glyphs to have to deal with. I also don't understand from the description why you wouldn't just use multiple scatter calls for the different subsets, if the subsets are known up front. What would really help here is a complete minimal example that shows how things are done currently, along with a proposed new version to compare that with.

bryevdv avatar Sep 11 '22 03:09 bryevdv

I suppose another consideration is that mulit-line lacks a way to have "line plus marker" and this would afford that. cc @bokeh/dev for thoughts.

bryevdv avatar Sep 11 '22 03:09 bryevdv

"line plus marker"

There is a mechanism for this called glyph decorations, but its scope is currently limited to a few glyphs and doesn't include MultiLine.

mattpap avatar Sep 11 '22 08:09 mattpap

Thanks for getting back to me :-D Here is an MRE where I illustrate the super simple/easy multiline setup, followed by @bryevdv 's suggestion of making multiple scatter calls/renderers to accomplish the same thing but with markers. There is significant added complexity and almost a doubling (!) of file size between the two.

import numpy as np
from bokeh.plotting import figure, show, save
from bokeh.models import Scatter, CustomJS, MultiLine, ColumnDataSource
from bokeh.layouts import row

data = {'ID':list(range(100))
        ,'Northing':np.random.random(100)*1000
        ,'Easting':np.random.random(100)*1000
        ,'Time':[list(range(1000)) for i in range(100)]
        ,'Value':[np.random.random(1000)+i for i in range(100)]}

src = ColumnDataSource(data)

pv = figure(tools='lasso_select',output_backend='webgl')
tf = figure(tools=[],output_backend='webgl')
pvr = pv.scatter(x='Easting',y='Northing',source=src)

###MULTILINE SETUP
#multiline is awesome for linking one to many/many to one
# tfr = tf.multi_line(xs='Time',ys='Value',source=src)

# save(row([pv,tf]),'test.html') #filesize ~1.4 mb
###

###JERRY RIGGED MULTISCATTER SETUP
#but if i want to do the same thing with scatter..
# I need to set up all this extra stuff to make a CDS and renderer for each "set"
# Then I need to write my own CustomJS to set up the selection linkage
# As an additional hurdle, I need to override selection/nonselection glyph properties, 
# because src.selected.indices = [] actually results in the selection glyph being used for all indices and I don't know how to override that

#Additionally (not doing this in this MRE but another point/added complexity)
#if there is any "meta" associated with each set, I'd be stuck duplicating records or writing more complex CustomJS
# for example say each location in src has a singular elevation associated with it, and on hover of the time plot, I want elevation to show up
# I'd need to either make 1000 entries of the same elevation for each "set" in the time plot
# or I'd need to write some CustomJS that would look up the elevation from the location source and put that value into the tooltip

def buildScatterDict(src,id_field='ID',x_field='Time',y_field='Value'):
    scatterDict = {}
    for i in range(len(src.data[id_field])):
        scatterDict[src.data[id_field][i]] = ColumnDataSource({x_field:src.data[x_field][i]
                                              ,y_field:src.data[y_field][i]})
    return scatterDict

sc_dict = buildScatterDict(src,'ID','Time','Value')

tfr_dict = {k:tf.scatter(x='Time',y='Value',source=sc_dict[k]) for k in sc_dict.keys()}

cb = CustomJS(args=dict(src=src,sc_dict=sc_dict,tfr_dict=tfr_dict)
              ,code='''
              //collect the selected IDs
              const sel_ids = src.selected.indices.map(i=>String(src.data['ID'][i]))
              //for those selected ids, select all indices for their sc_dict source
              for (const [k, v] of Object.entries(sc_dict)){
                      if (sel_ids.includes(k)){
                              v.selected.indices = [...Array(v.data['Time'].length).keys()]
                              tfr_dict[k].glyph.fill_alpha = {'value':1.0}
                              tfr_dict[k].glyph.line_alpha = {'value':1.0}
                              }
                      else{
                          v.selected.indices = []
                          tfr_dict[k].glyph.fill_alpha = {'value':0.1}
                          tfr_dict[k].glyph.line_alpha = 0.1
                          }
                      v.selected.change.emit()
                      console.log(k)
                      console.log(sc_dict[k].selected.indices)
                      }
              ''')
src.selected.js_on_change('indices',cb)
save(row([pv,tf]),'test.html') #filesize ~3 mb

I also noticed that without webgl backend the Scatter setup lags massively while the MultiLine runs fast with either 'canvas' or 'webgl'. Finally the MRE above only really handles the selection linkage --> I'd have to set up the same thing for hover as well...

To kinda clarify/refine the request, I would totally be happy with just getting the means to do "line plus marker" for multiline (is there an example somewhere for say a Line glyph?). It would accomplish my use case (and probably many others too), (probably?) minimize dev effort, and minimize the number of "oddball" glyphs 😂.

gmerritt123 avatar Sep 11 '22 13:09 gmerritt123

There is a mechanism for this called glyph decorations, but its scope is currently limited to a few glyphs and doesn't include MultiLine.

Without knowing more about design and plans, it's hard to say whether it would apply. For this use case:

  • must be drive-able from the same data source as the multi-line
  • must be able to have different marker visuals per sub-line of the multi-line

If decorations don't / won't afford those features, then I don't think they are relevant.

"line plus marker" for multiline

Currently it is a multi-line plus a separate marker for each sub-line. My comment is about there being value in having a parallel "multi-line + multi-marker, driven from the same CDS" to match "single-line + single-marker, driven from the same CDS" that currently is possible.

I would tentatively mark this as feature but I guess what I would prefer up front is some plan to improve the story around the "ragged" arrays that multi-glyphs use, in general. The handling and representation of those is still rather ad-hoc.

bryevdv avatar Sep 11 '22 15:09 bryevdv

Currently it is a multi-line plus a separate marker for each sub-line. My comment is about there being value in having a parallel "multi-line + multi-marker, driven from the same CDS" to match "single-line + single-marker, driven from the same CDS" that currently is possible.

I think we're on the same page. There is significant value in it IMO. The one nuance I'm considering (and I may either be ahead of or behind you on this) is whether to allow formatting individual markers within each CDS record based on another nested list... I don't think it's necessary and would involve extending things beyond what MultiLine is currently capable of:

src = ColumnDataSource(data={'xs':[[1,2,3],[1,2,3]]:'ys':[[1,2,3],[1,23],'fill_color':['blue','red']} #YEA
src = ColumnDataSource(data={'xs':[[1,2,3],[1,2,3]]:'ys':[[1,2,3],[1,23],'fill_color':[['blue','blue','purple'],['red','red','purple']]} #NAY
p.multi_scatter(xs='xs',ys='ys',fill_color='fill_color',source = src)

I also don't think glyph decorations would work for my use case (and many others) as I want them participating in hit testing.

I would prefer up front is some plan to improve the story around the "ragged" arrays that multi-glyphs use, in general. The handling and representation of those is still rather ad-hoc.

As a (humbly) "advanced" user but non-dev I can say that this has confused the hell out of me in several instances writing CustomJS: ImageRGBA manipulation/slicing being the most vivid one. I have nothing to offer other than a well documented and standardized implementation when "raggedness" becomes necessary (i.e. the "multi-glyphs") would be awesome :-)

gmerritt123 avatar Sep 11 '22 17:09 gmerritt123

I also noticed that without webgl backend the Scatter setup lags massively while the MultiLine runs fast with either 'canvas' or 'webgl'.

There isn't yet a WebGL implementation of MultiLine, so it drops back to using Canvas here.

I would tentatively mark this as feature but I guess what I would prefer up front is some plan to improve the story around the "ragged" arrays that multi-glyphs use, in general. The handling and representation of those is still rather ad-hoc.

Awkward Array is very promising, and work in underway on AwkwardPandas (https://github.com/intake/awkward-pandas) so that you can have an Awkward Array as a pandas column. It is efficient and I can see it becoming popular. I suspect we could serialize an Awkward Array through to our TypeScript RaggedArray really efficiently.

The relationship between a hypothetical MultiScatter and Scatter is different to that between MultiLine and Line. The visual properties of Scatter are already vectorised, one per marker. The MultiScatter API would therefore be identical to Scatter except that x and y would be nested sequences rather than sequences. If we support ragged/awkward arrays here we could probably allow Scatter to accept either sequences or nested sequences for the coordinates and the vectorised visual properties apply to the top-level coordinate sequences. If that was considered a good thing and didn't introduce a whole load of problems that I haven't considered, then this could be generalised across many glyphs to provide Multi* functionality without a proliferation of classes that are very similar.

ianthomas23 avatar Sep 12 '22 10:09 ianthomas23

@ianthomas23 All of that sounds right, the actual oddball after all this potential work would be Line which is not vectorizable at all (because of its single connected nature). Otherwise glyphs are mostly all top-level vectorizable

I'm definitely interested in the UX aspect an any conveniences that we might afford to users, but I really had in mind making the serialization less hacky. E.g an idea is that we could define a column to hold a ragged array type:

{
  __type__: "ragged-array"

  0: <list or typed array>
  1: <list or typed array>
  ...
  n: <list or typed array>
}

This is just an off the cuff musing, I am sure @mattpap will have lots of input here.

bryevdv avatar Sep 12 '22 15:09 bryevdv

Noting that a tangential use-case came up on the Discourse:

https://discourse.bokeh.org/t/customjs-for-selected-indicies-after-region-selection-in-multi-line-model/9895/6

They are looking for box-select on multi-line but I think they would be content with a box-select on the vertices of a multi-line. We could consider adding a "mode" to a box-select on multi-line for "just vertices" (vs the more complicated segment checking) but alternatively just adding a multi-scatter with box select would also afford that.

bryevdv avatar Jan 03 '23 19:01 bryevdv

I do think an issue we painted ourselves into a corner with is having, very specifically, multiline_indices. It would have been better (more future-proof) if that had been called multi_indices instead.

bryevdv avatar Jan 03 '23 19:01 bryevdv

Well, as mentioned earlier, tentatively marking this a as feature for unspecified 3.x mostly in order to elevate visibility and discussion at some point in the next few releases.

bryevdv avatar Jan 03 '23 20:01 bryevdv

They are looking for box-select on multi-line but I think they would be content with a box-select on the vertices of a multi-line.

Yes, this is correct. This is what we need.

ngetter avatar Jan 03 '23 20:01 ngetter

@ianthomas23 All of that sounds right, the actual oddball after all this potential work would be Line which is not vectorizable at all (because of its single connected nature). Otherwise glyphs are mostly all top-level vectorizable

I think that people do want a vectorizable single fully connected line, with a different color for each line segment (e.g. a trajectory with velocity for that portion of the journey). So if I'm understanding this discussion correctly, I wouldn't think Line would need to be an oddball here, just that for Line the vectorization goes by line segment while for MultiLine it goes by connected segment.

jbednar avatar Jan 04 '23 21:01 jbednar

@jbednar I think that is a separate issue, and that there is in fact an old GH issue for it somewhere (possibly closed). People do want "colormap along a line" but that's not really related to this directly.

I do agree with @ianthomas23 that Line is the oddball, specifically because of its connected topology. And I would not choose to consider it "vectorized by line segment", because, e.g., there are not as many segments as there are points in the data source. That's also the reason why "color for every segment" does not fit in very well. Expanding the notion of "vectorized" to include this would really, really muddy the waters IMO.

bryevdv avatar Jan 04 '23 21:01 bryevdv

There is a mechanism for this called glyph decorations, but its scope is currently limited to a few glyphs and doesn't include MultiLine.

@mattpap This has come up again in forums recently, can you update on the current status? I don't think any of the more recent decoration work changed this state of affairs, but wanted to confirm. If so, Can you give some idea what things might look like with a future API, to decorate a multi-line with a scatter?

bryevdv avatar Feb 29 '24 16:02 bryevdv

Copied from https://discourse.bokeh.org/t/line-plot-using-view/11290/12

For example, the multi_scatter would be useful for boxplot. The bar is created with patches, whiskers -- with multi_line, and outliers -- with multi_scatter. All glyphes use the same source ("list of list") for xs and ys.

x-axis is dogded (I mean https://docs.bokeh.org/en/latest/docs/user_guide/basic/bars.html#visual-offset).

dinya avatar Mar 05 '24 05:03 dinya