jupyter-resource-usage
jupyter-resource-usage copied to clipboard
Report kernel metrics
This was briefly mentioned in https://github.com/yuvipanda/nbresuse/pull/22#issuecomment-588158223. Opening a new issue for better tracking.
It would indeed be really useful to track cpu and memory usage per kernel. The frontend could then query this data and display a more granular view on the resources being used.
This sounds like it should be doable when the kernels are local to the notebook server. But it might be slightly more complicated in the case of remote kernels.
cc @Tommassino who has a branch with such changes: https://github.com/Tommassino/nbresuse/tree/kernel-metrics
Maybe we could check if this could be merged into nbresuse?
I agree, this would be a really great feature to have in NBResuse.
For now I think it should be sufficient to state explicitly that the feature is only intended to track usage for "local" kernels. That's the most common use case, even when using JupyterHub where the server is remote but the kernels are local to the server. As long as we're upfront about the limitation, I don't imagine it should be too much of an issue.
If we get this set up for local kernels, then afterwards maybe we can try to get feedback from the people working on Enterprise Gateway about what they would think regarding support for remote kernels and/or how they think it could/would best be implemented. One step at a time though.
[Cross-posting from #13 at the request of @jtpio]
I'd suggest per-kernel metrics be the default. I came upon nbresuse hoping for more granular information than what top can already tell me. When per-notebook metrics are available, I feel total server usage is not a very useful default, for these reasons:
- Total process memory usage is already easily found in
toporpsor any system monitoring utility, which is arguably the natural first "go to" place for memory information. - Internal Jupyter- and notebook-specific resource usage cannot be found with general tools, and Jupyter is the first and obvious place to report such info.
- If your goal is to monitor bumping up against system limits:
- This is dependent on the usage of other processes / users
- Usage of Jupyter, other processes, and system limits are effectively reported by familiar tools
- You would still want to know which notebook is hitting the limits and/or which notebook would give the biggest savings if shut down.
- It is confusing and perhaps misleading for all notebooks to report the same memory usage: this is a giveaway that nbresuse is either wrong or multiple-counting, and decreases confidence in it. Total usage would be better reported in a general server location location like the File | Open page, rather than repeated in notebook-specific locations.
- It is difficult to monitor the change in usage for individual cells or pieces of code.
Ideally users could configure the metrics they would like to see, but I think per-notebook metrics make the most sense as the default.
top or ps is the default for experienced users, but Jupyter is often used by beginners or people less familiar with the command line. It may not be obvious that a notebook has failed due to memory since often it'll hang or crash instead of printing an out-of-memory error.
Example use case: https://github.com/jupyterhub/binderhub/issues/1097
Is it possible to show both metrics by default, or do you think that's too confusing?
Clicking to toggle between notebook and server usage seems like a win/win. I'd be happy with server usage being the default as long as individual notebooks remember my setting so I don't have to change it every time. In fact we might want to throw total system usage into the rotation too.
It seems like the bias is toward isolated single-user cloud environments, where all of the resources belong to you. Even if that is the common case, just keep in mind that being able to finger a single notebook as "the" reason you've run out of memory, is in fact a very special case.
Even in that case, you can mysteriously run out of memory long before you hit the system total, because other processes use memory, too. Imagine a user on an 8GB system with 1 GB used by OS processes. The user's notebook mysteriously crashes at 7GB used, even though it says they have 8 available.
It seems like what you want is a better memory limit value: instead of the limit being total system memory, report the limit as notebook/server usage plus free memory. That way it dynamically accounts for the usage of other processes, and is what you actually have to worry about hitting. In our example, if the user was using 5 GB, with 2 GB free, we would show 5 / 7 GB used. This would probably get rid of a lot of the need to manually set limits as well.
I would also find this feature useful - to have per kernel or per notebook resource usage.
We have been working on this in https://github.com/Quansight/jupyterlab-kernel-usage - take a look and let us know what you think.
Nice thanks @mlucool for sharing.
Do you see jupyterlab-kernel-usage living in its own (separate) extension? Or maybe it could be integrated here as part of the jupyter-resource-usage extension?
cc @echarles
In the short term, we think it makes sense to live in its own extension as we are still perfecting the experience. Down the line, it could make sense to integrate it back into this one or possibly into core.
@mlucool Does jupyterlab-kernel-usage work with Python kernels only or does it support kernels of other languages as well? I gave it a try (with JupyterHub instead of JupyterLab), kernels other than Python couldn't be started.
@dclong jupyterlab-kernel-usage onlly work with ipython kernels https://github.com/ipython/ipykernel.
There is the idea to normalize the resource usage request/reply via a JEP (Jupyter Enhancement Proposal) so that any other kernel could also implement the defined protocol, but that will be a long road.
The jupyterlab extension will work with ipykernel in any deployment (jupyterhub...).
We are planning a new release next week. Please open any issue or feature request on https://github.com/Quansight/jupyterlab-kernel-usage/issues
Closing as https://github.com/jupyter-server/jupyter-resource-usage/pull/163 has now been merged.