cocalc
cocalc copied to clipboard
jupyter: more efficiently support bokeh data messages (was: increase threshold of "Fetch additional output")
I'm not sure about the exact logic and overall drawbacks, but I've just had a rather moot experience while testing plotting with bokeh in cocalc's juypter notebooks. I had to press that "fetch additional output" button if there were more than a few dots displayed – the last plot in the example, the other ones did show up immediately. Whatever that value is, maybe we should 10x it?
Oh, another thought: since this could be problematic regarding syncing and sending around too much to all clients .. maybe that threshold should be different if this is my active cell compared to that cell being not under my immediate attention?
The correct solution is to understand how large output from bokeh is encoded then special case how it is served, similar to how images in outputs are served via http.
What is your test example to reproduce the problem?
The notebook I linked to, those examples are straight from the documentation. The last cell is the one with more output, i.e. above that threshold.
Bokeh is embedding the entire description of the image being plotted as data messages in the output. This is thus all record as part of our realtime sync, diffed, stored in the database, etc., and just generally very heavy. This is not a problem at all with output that is rendered as png's or svg's, because the underlying data is stored in a sqlite database in the project. The mechanism to do this is pretty generic and could just as easily I think handle the application/vnd.bokehjs_exec.v0+json and application/javascript messages that bokeh produces.
This would keep those our of our realtime collab history and database, keeping it lightweight. They automatically get put back in the ipynb file on save, of course, and stripped from it on load, just like with images.
I'm just describing this here for posterity. I'll likely be the one to implement this, hence assigning this ticket to me. Good Bokeh support of course fits with our dev roadmap!
I just realized that a far better approach would be to support https://github.com/bokeh/jupyter_bokeh
This uses ipywidgets to display bokeh plots instead of big output messages full of data and custom javascript that gets evaled. It should be vastly superior because:
- it won't loose state when you scroll around (currently bokeh does, due to virtualization/windowing of jupyter, e.g., try zooming in, scrolling the plot out of view, then scrolling back). With widgets, the state can be synced to the backend and also shared between all viewers, so it isn't lost. Of course, we could also fix this by somehow doing all the bokeh stuff in an iframe, since we save those when scrolling.
- From the examples, it looks like using jupyter_bokeh widgets means you can combine bokeh with ipywidgets nicely, which is of course a big advantage. I haven't tested this.
- Obvious: the way widgets work is that the data about what is being plotted is not part of the ipynb document or output messages at all -- it travels over a separate "comm channel". That completely solves the "Show more output" problem that motivated this issue, in a very eloquent way.