jupyter_server icon indicating copy to clipboard operation
jupyter_server copied to clipboard

An event system for Jupyter

Open afshin opened this issue 2 years ago • 16 comments

This is a draft document, please feel free to comment and help.

We have (at least) two concurrent efforts that overlap but are not full implementations of a generic event system for Jupyter in themselves: jupyter-telemetry and jupyterlab-notifications.

A synthesis of these extensions with generic endpoints (i.e., not specifically designed and named for telemetry or notifications) would yield a flexible general-purpose event bus for jupyter-server-based applications.

cc: @andrii-i @3coins

Architecture of events API

REST Endpoints

  • POST /api/events - create new events
  • GET /api/events/schemas - query/list registered schemas (maybe -- needs discussion)
  • POST /api/events/schemas - register schemas (maybe -- needs discussion)

WebSocket endpoints (WebsocketHandler)

  • /api/events/subscribe - fire hose of all events -- perhaps accept filters? (see open question below)
  • /api/events/subscribe/notification -- subscribe to events of type notification

Open Question: Should the WebSocket handler support making a request for multiple filters to be applied instead of just the one proposed in the URL scheme above?

Depends on jupyter_events package

  • exports EventLogger object in (formerly EventLog in jupyter_telemetry)

Case Study: JupyterLab Notifications

Server-side functionality

  • Subscribes to all notification events that pass through the event bus
  • Adds each notification as a row in a SQLite database on the server with a key for the recipient identity as well as an ID
    • notification events with multiple recipients can be de-normalized here and written as multiple rows
  • REST API

    • GET /api/notifications - retrieve a list of all notifications that authenticated user can see
    • GET /api/notifications/{ID} - retrieve a specific notification
    • DELETE /api/notifications/{ID} - delete a specific notification

Client-side functionality

  • Subscribe to the /api/events/notifications WebSocket
    • Throttle its incoming messages at some reasonable rate (on the order of 0.5-1 seconds)
  • Treat incoming messages from the events API as a notifier only -- check the /api/notifications endpoint for the actual list of messages
  • Render the badge and the notification center UI inside JupyterLab/Notebook

JupyterLab 4 extension

  • A Token (e.g., INotifications or IEvents) that exposes an IDataConnector for event CRUD and an ISignal for event subscription
  • A visual UI for an event notification center

Jupyter Notebook 7 extension

  • The Token from the JupyterLab extension
  • A version of the JupyterLab UI for notifications

afshin avatar Apr 07 '22 15:04 afshin

@afshin Should we just add this to the server or need a new server extension package? Is anyone assigned to this task?

3coins avatar Apr 07 '22 15:04 3coins

@3coins, this should be in the core server.

Currently, no one is specifically assigned. I'd like to see the user interface portion of this landing in JupyterLab and I am happy to work on any part of the stack that helps get us there.

I think that the work on this already done in the telemetry space might be farther along than the server extension from the notifications extension, so grafting those handlers into jupyter-server might be the best way of bringing this into core.

What are you thinking? Let's have a conversation about this with all the people who have interest and bandwidth to work on it.

afshin avatar Apr 07 '22 15:04 afshin

@afshin

What are you thinking? Let's have a conversation about this with all the people who have interest and bandwidth to work on it.

Agree, let me know if you want to have an offline discussion including anyone else who wants to work on this; personally, I would like to get some experience on the server side, but happy to work on any part of the stack. Is there an expected time frame to get these changes done?

3coins avatar Apr 07 '22 16:04 3coins

I think Zach is rounding up interested folks (including you) for a conversation.

We are targeting jupyter-server v2 and jupyterlab v4 (so late June, early July).

afshin avatar Apr 07 '22 16:04 afshin

@afshin / @Zsailer : please include me on any meeting that might happen related to this. I am interested to learn more on this area and contribute anyway I can.

rahul26goyal avatar May 01 '22 10:05 rahul26goyal

As discussed in the server meeting on 5/5/2022, here is an initial list of tasks for the event notification system. This list is by no means final, feel free to add comments or feedback.

  1. Event Bus - #820

    • A central event bus to relay events
    • /api/events/subscribe - Websocket for subscribing to events
    • A default handler for consuming events
  2. Rest API Endpoints

    • POST /api/events - Rest api to create new events
    • GET /api/events/schemas - Rest api to query/list registered schemas (Optional)
  3. Event buffer

    • A queue/buffer to store undelivered event messages
  4. JupyterLab 4 Event Client (jupyterlab-events)

    • Reuse jupyterlab-telemetry repo, either rename or copy to jupyterlab-events
    • Remove server endpoints, any redundant server code
    • Update client handlers to use the rest api endpoints
    • Add websocket handler to enable subscribtion to events
  5. Add Default events in server

    • Add default events e.g., content handler, kernel events in jupyter server
  6. JupyterLab 4 Updates

    • Add jupyterlab-events as dependency inside JupyterLab
    • Subscribe to default events
  7. Event Notification UI (JupyterLab)

    • UI updates for event notification
  8. Jupyter Notebook 7 Updates

    • Add jupyterlab-events as dependency
    • Subscribe to default events
    • Can we reuse event notification UI from JupyterLab?

3coins avatar May 05 '22 20:05 3coins

Here is a document we can collaboratively edit so that the front-matter of this issue can have a canonical version that we edit once it is ready: https://hackmd.io/q4Rkq2BaS1SIXvyzt8j1yA

afshin avatar May 13 '22 16:05 afshin

Since the event system is a new service that we are just starting to develop, how about making it as much as possible backend-agnostic? By that I mean that most of the logic should be usable in both jupyter-server and jupyverse. But it is currently very tied to jupyter-server, Tornado and traitlets, which we don't want to depend on in jupyverse.

davidbrochart avatar May 31 '22 15:05 davidbrochart

Thanks for bringing this up, @davidbrochart! I think we're going to see this question/conversation come up multiple times moving forward as we continue pushing Jupyter Server forward, while trying to bring jupyverse to the front.

Let me start by saying—technically, the event system is backend agnostic. We just defined a REST + websocket API for posting/subscribing to events. These are schema/protocol driven. Jupyverse can/should create an implementation of this API. I don't think there is anything tied specifically to Tornado here. Any server implementation will always have to write some server-library-specific code to make it work. Consider, if we started this in jupyverse, how would we port it to jupyter_server? We would have to re-implement the handlers in Tornado and drop the FastAPI specific logic.

That said, under the hood, we depend on jupyter_telemetry (hopefully, switching to jupyter_events soon) and you are correct—jupyter_telemetry/events depends on traitlets.

That's because we needed the Event System API to be configurable. I don't see a way around using traitlets for this without switching to some other backwards compatible, backend-agnostic, config-based library. For example, it looks to me that jupyverse/FPS is implementing its own (non-backend agnostic) configuration system, fps.config. While I believe FPS offers a much cleaner way to handle config, it's not backwards compatible with Jupyter Server. This might be a place we can improve.

Unfortunately, at this time, I don't see a single solution that would work for both. And while I see jupyverse as our future (it's awesome!), I don't think we should block jupyter_server from making advancements using the older dependencies at this point in time.

Do you have ideas how to reconcile this?

Zsailer avatar May 31 '22 17:05 Zsailer

You're right Zach, jupyverse also has implemented specific logic for configuration, and I guess depending on FastAPI makes it kind of specific to this framework too. I'm thinking about some low-level logic (functions, classes...) that would be called from either a Tornado handler or a FastAPI router, with all configuration already resolved at this point, and passed as generic arguments.

davidbrochart avatar May 31 '22 17:05 davidbrochart

"backwards compatible, backend-agnostic, config-based library"

To me, this is the "holy grail".

We could probably get pretty close by

  1. writing logic that translates traitlets config into a pydantic BaseModel.
  2. handling traits/fields that "observe" other traits/fields.

Zsailer avatar May 31 '22 17:05 Zsailer

I meant something more simple, like this GET handler calls this get method. If we can have the logic in the get method in a separate package, that's a great step towards backend agnosticism.

davidbrochart avatar May 31 '22 18:05 davidbrochart

Is an event intended to notify the user visually? If so, will we distinguish between read and unread notifications, high-priority and low-priority, notifications, etc.? I'm also curious about whether notifications might be transmitted via other means, such as e-mail or SMS.

JasonWeill avatar Jun 15 '22 16:06 JasonWeill

@jweill-aws the "case study" above is about notifications and the idea is that it becomes an extension's job to manage its state. In the case of notifications, the extension will write events it cares about from the event bus into a SQL database and it will be the job of the client to call DELETE to remove those items from the database (i.e., make them "read").

afshin avatar Jun 15 '22 17:06 afshin

In https://github.com/jupyter/jupyter_events/pull/2, we've have been discussing the handling of sensitive data in the event system. I'm confident that these are already "solved problems" in other systems, so I need some help gathering information about how to properly do it here.

In https://github.com/jupyter/jupyter_events/pull/2, I added a required field to every schema, "redactionPolices", that is used to describe the sensitivity of every event property. The event logger can be configured to redact sensitive policies from all data in all events. This data is redacted before the event is ever emitted. This provides a simple way to ensure that sensitive data is never persisted.

On the other hand, if a client (e.g. JupyterLab) builds features that depend on the event system, and these features depend on receiving all of the data, redacted events/data breaks these features. This makes the event system unusable to these features when launching in a data-conscious (i.e. most) environments.

To make the event system useful, we need to a secure way to handle sensitive data in transit, specifically when moving between Jupyter Server and its clients. Today, the event bus added in #820 shuttles raw events to the client across the websocket. Any authenticated websocket client can connect to this websocket and "see" all event data—this obviously isn't a secure approach.

This is where I need some help. What are some known patterns for handling sensitive data in transit from server to client? If we encrypt the data in the server, how do we secure decrypt it in something like JupyterLab?

Zsailer avatar Jul 22 '22 17:07 Zsailer

The basic "plumbing" for Jupyter server's event system landed here: https://github.com/jupyter-server/jupyter_server/pull/862

We've started logging some events from the contents here: https://github.com/jupyter-server/jupyter_server/pull/954

Zsailer avatar Sep 01 '22 14:09 Zsailer