jupyter_server
jupyter_server copied to clipboard
Support GraphQL
See https://github.com/jupyterlab/jupyterlab/issues/11789
Problem
JupyterLab constantly polls the server to retrieve information about:
- files and directories (
contentsAPI) - running terminals (
terminalsAPI) - running notebooks (
sessionsAPI) - running kernels (
kernelsAPI)
It's not optimal because it might:
- make useless requests (when there is no update)
- make requests long after the update happened
- get back more information than needed
Proposed Solution
Would it make sense to support a GraphQL API? I remember there has been some work on this already, but I can't find it.
AFAIK @saulshanabrook did all of that work on his rtc branch
@davidbrochart The demo coding I was working on is in https://github.com/saulshanabrook/rtc/tree/graphql/packages/jupyter-graphql It was working for a subset of the Jupyter server and used subscriptions to push the analysis of kernel messages to the server, keeping the state there.
I stopped working on it due to pressure to prioritize a working RTC implementation.
The client having to pull those information is really something we should move away from in favor of something more event-based pushed from the server.
If GraphQL can bring that, it would be wonderful. This should be an addition to all the existing APIs, not a replacement to ensure backwards compatibility.
I stopped working on it due to pressure to prioritize a working RTC implementation.
@saulshanabrook Yeah, we had discussed that. My understanding is that GraphQL still makes sense even without the RTC aspects which is now implemented via CRDT. But from what I see, not all aspects of RTC needs should/could be covered by CRDT which is focussed on a pure documents. e.g. The RTC event "Open a notebook" could be fulfilled by GraphQL... ? (just thinking loud)
Thanks @blink1073, @saulshanabrook and @echarles for the feedback. I think GraphQL is not only helpful for handling notifications pushed from the server (and removing the polling from the client), but also to give clients more flexibility as to which information they request. If it can be useful for RTC, that would be another reason to support it, but I'm not sure how. I will look at Saul's work to try and have a better understanding. Maybe Jupyverse would be a good place to start experimenting with a GraphQL API, because FastAPI makes it easy to use any ASGI-compatible GraphQL library. I don't know if it's as easy with Tornado.
Maybe Jupyverse would be a good place to start experimenting with a GraphQL API, because FastAPI makes it easy to use any ASGI-compatible GraphQL library. I don't know if it's as easy with Tornado.
As @saulshanabrook pointed out, GraphQL on Tornado is already implemented on https://github.com/saulshanabrook/rtc/tree/graphql/packages/jupyter-graphql
I would favor experimenting on top of the existing Jupyter Server instead of Jupyverse to deliver value as soon as possible to existing Jupyter Server frontends.
Yeah the implementation I was working on works as an extension on top of Jupyter Server, which allows clients to connect either with the existing endpoints or by using GraphQL. It uses the same in memory data structures as the server, to allow both simultaneously.
See for example https://github.com/saulshanabrook/rtc/blob/graphql/packages/jupyter-graphql/jupyter_graphql/jupyter_server_extension.py which adds a Jupyter Server extension, for graphql as well as the grpahql playground.
The Services class is what takes the jupyter server services and adds listeners to keep its own structures.
I gave a demo of the working code in an RTC meeting a while ago: https://youtu.be/fRlVawMDVMk?t=608
Hey, y'all! Hooray GraphQL!
With another jupyter-graphql, we got up to some fairly interesting demos. I particularly liked:
- integration with graphql-voyager: having an accurate, well-typed schema that happens to generate interactive documentation is :heart_on_fire:.
- wrap nbconvert in a subscription so you could emit a live-updating view of a rendered notebook.
If i was doing it again, I would not use the graphene ORM magic, but instead ariadne, as @saulshanabrook did, or tartiflette... whichever seemed more robust/maintained/extensible. As they are both schema-driven, it would be relatively straightforward to do a bakeoff. And the schema part is the big win, as it mostly avoids things like #518. Indeed, the types that come of GraphQL are about as expressive as TypeScript, and beyond JSON schema... certainly robust enough to generate either... or a bunch of other things.
At the time, extensible GraphQL schema wasn't really A Thing, but now that schema federation is more well-defined, I'd probably lean towards that. The magic here would be the ability to reuse core Jupyter types on top of other GraphQL-enabled apps such as gitlab or dagster.
In addition, there's also some of @rgbkrk's work on some node-based stuff.
ASGI-compatible
It's great for python to define a semi-formalized thing, and indeed, I feel like adopting the ASGI model would be a step forward rather than requiring tornado or FastAPI... but the long con of Jupyter infrastructure can't be python-only. Getting things like #518 under control so folk could really explore alternate high-performance (or lower-resource) implementations would pay off handsomly.
easy with Tornado.
It's entirely possible to shoehorn an ASGI app in-loop with tornado. I think this is critical for an extensible system that can also take advantage of all of the existing (and future) services a jupyter server + extensions might provide.