Would like some advice on a new project: Matplotgl
Dear ipympl community,
I don't know if this is the right place to post, but I couldn't find a discussions tab on this repo.
I have been using Matplotlib and ipympl for many years now.
I recently started a new project which aims to address some of the issues of ipympl, mainly to do with performance of user interactivity.
The project is called Matplotgl (repo: https://github.com/scipp/matplotgl/, docs: https://scipp.github.io/matplotgl/). It tries to be a (almost) drop-in replacement for Matplotlib, but uses Pythreejs for rendering the graphics (note that currently it is only available on pip).
Because of the webgl enabled by Pythreejs, it will easily handle 500,000 scatter points, or large (4096x4096) images, with smooth panning and zooming.
I have to underline that at this point, it is still extremely early days, and not much of the Matplotib API works. But we do support some basic examples, such as:
import matplotgl.pyplot as plt # Note 'matplotgl' instead of 'matplotlib'
import numpy as np
fig, ax = plt.subplots()
xx, yy = np.random.normal(size=(2, 300_000))
scat = ax.scatter(xx, yy)
ax.set_xlabel('X position [m]')
ax.set_ylabel('Y position [m]')
ax.set_title("A large scatter plot")
fig
A figure in matplotgl is constructed using the following elements:
I would like to know if there is any interest in the community for such a project, if you think the approach is viable, and if there are any good ideas as to how we could support a large part of the matplotlib API without having to duplicate it all.
If you are interested, please play with it, report bugs, and you may even want to contribute? There are many things that still need to be implemented ;-) Many thanks!
(and apologies again for spamming the issue list, I wasn't sure how else to reach out; feel free to close the issue if you think this should be posted somewhere else)
Why not. I have also been using Matpotlib/ipympl for a long time. I love it, but the main problem for me is the inefficiency of Matplotlib to do interaction on very large objects.
One silly question (I don't know much about three.js nor OpenGL) would it be really faster, and allow smooth handling of, say, selecting one of your 300k scatter point ? or zooming in a graph with 4M vectors or even 10M vectors (more typical for me). I'm willing to give it a try if I find some time (but probably not before Christmas - sorry)
Nice contribution! Thanks for sharing this cool project, I'm a TreeJS lover 😄
Small note on performance, there may be some tradeoffs to consider.
For huge datasets, it may actually be slower to send the data buffers through the internet connection for the client to render the data using ThreeJS, e.g. when using Jupyter through a deployed JupyterHub. You pay the cost of sending the data. Even locally, that cost may not be negligible (you really need to make sure to use ipywidgets's binary buffer feature).
In ipympl the data transfer cost is the same whichever the size of the data (you always send a compressed image diff through the wire).
So the speed comparison would be:
- rendering: OpenGL server side is more performant than client-side ThreeJS (WebGL) -> using something like https://github.com/karlwessel/mplopengl for the rendering and streaming the data using ipympl could be cool (actually not possible as-is but that could be an idea to pursue).
- data transfer: if the size of the image that ipympl streams is bigger than the dataset to render, it may be faster to send the dataset over the wire and let ThreeJS handle the rendering. Otherwise the threejs solution may actually be slower.
Thanks for the feedback. Interesting point about JupyterHub... I guess there isn't a single silver bullet that works everywhere.
you really need to make sure to use ipywidgets's binary buffer feature
I don't know what that is or how to use it. Can you expand a little? 🙏
ipywidgets's binary buffer feature
Sorry I haven't been really specific, it's not an ipywidgets feature but a Jupyter communication feature.
Basically all Jupyter messages (sent from the Python kernel to the JavaScript client and the other way around) are JSON-based. JSON is great for its clarity but performance isn't there. Especially, sending huge datasets through JSON is far from effective. Even worse, jupyter-server, which should be just a pass-through of messages between kernel<->client, is not a pass-through, it reformat JSON messages and make copies.
So to workaround these issues, they introduced the ability to send raw binary buffers alongside the JSON messages. This makes sending the dataset from the kernel to the client more performant.
I looked briefly and I see that pythreejs depends on ipydatawidgets which is a place that implemented support for using binary buffers. So you should be good hopefully!
@martinRenou do you happen to know what the plans are for developing/maintaining Pythreejs? The latest release was in 2022, and I know that the main developer (Vidartf) does not really have the time to work on it anymore... Thanks
I have the same info that you have. Haven't interacted with Vidar since some time.