jupyter_server
jupyter_server copied to clipboard
[Telemetry] Kernels Event Schemas
Inspired by #231, I'd like to start a discussion on drafting an event schema for the kernels API.
Telemetry is set to land in #233. Once this plumbing is in place, we can begin logging events from all services in the Jupyter Server.
We can start drafting event schemas for each event, pulling on knowledge+needs of people interesting in collecting this data. This issue is meant to discuss+draft an event schema for kernels. I'll post an initial draft here soon.
In terms of being able to audit the code that users are running, we need to be able to determine which users ran certain code and when they ran it. We should have 4 properties in the schema:
- The username of the user who owns the notebook. This might be easier in some deployments than others. For example, on a large shared server it is probably easy to determine. In other cases where the notebook/jupyterlab servers are containerized they may all have the same username (e.g. jovyan). However, it will be up to the administrators to make sure that this is configured properly. This field should be marked as PII.
- The actual code from the cells being run. This should be relatively straightforward to get since it is contained in the messaging protocol. I think this field should be marked as PII since people might include identifying information in their code.
- A timestamp for when the code was run.
- The name of the notebook where the code came from. I'm not sure how hard this is to get, but it is useful context when looking at the audit data.
And as @kevin-bates mentioned in #231 it would be useful to include extra context from the kernel as well for cross referencing.
Any progress on this?