jupyter-resource-usage
jupyter-resource-usage copied to clipboard
Feature to change default metrics for memory calculation from RSS to PSS
Problem
jupyter-resource-usage
exaggerates usage probably due to copy-on-write. Shifting the methodology from RSS to PSS can help solve the problem. See the below reproducer -
import multiprocessing
import time
import psutil
def foo():
time.sleep(10000)
parents_original_memory = bytearray(int(1e9)) # 1GB
for i in range(10):
multiprocessing.Process(target=foo).start()
def get_memory_info(type):
process_metric_value = lambda process: getattr(process.memory_full_info(), type)
current_process = psutil.Process()
all_processes = [current_process] + current_process.children(recursive=True)
return (
f"{sum([process_metric_value(process) for process in all_processes]) / 1e9} GB"
)
print("RSS: ", get_memory_info("rss"))
print("PSS: ", get_memory_info("pss"))
Output is -
RSS: 11.590012928 GB
PSS: 1.082778624 GB
jupyter-resource-usage
reports RSS
in JupyterLab, but PSS
seems more accurate.
Proposed Solution
- Changing RSS to PSS. But, as per this comment, PSS is a linux-only feature.
- Another solution can be - doing memory calculation via both methods - RSS & PSS (if supported by OS) and let users easily override / change the default metric - through some configuration or by introducing a settings page for the extension.
Additional context
Found similar issue - issues#16
Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:
Thanks @nishikantparmariam for the suggestion :+1:
Changing RSS to PSS. But, as per this comment, PSS is a linux-only feature.
Looks like a similar change landed in ipykernel
: https://github.com/ipython/ipykernel/pull/948
Maybe it would make sense to return pss
when available, and default to rss
otherwise (like it is now)?
While keeping rss
as the field in the response to avoid breaking other consumers of the API if possible:
https://github.com/jupyter-server/jupyter-resource-usage/blob/559710d106743160180abfefb065512fd1ac137f/jupyter_resource_usage/api.py#L48
Maybe it would make sense to return pss when available, and default to rss otherwise (like it is now)? While keeping rss as the field in the response to avoid breaking other consumers of the API if possible:
This sounds good!
Just a clarification, I am guessing with the new change, by default the frontend will also use pss
and if not available fallback to rss
otherwise?
If not, we can make this configurable through settings by adding a boolean say use_pss_if_available
(we can think of better name) which defaults to False (current behavior) and can be set to True.
Just a clarification, I am guessing with the new change, by default the frontend will also use
pss
and if not available fallback torss
otherwise?
Right, and this would be handled on the server directly similar to https://github.com/ipython/ipykernel/pull/948.
Actually after looking more closely at the code, there seems to be a way to configure which metrics to use already:
https://github.com/jupyter-server/jupyter-resource-usage/blob/78d2e1f6f47bc469c983eb94e8085918e1a8e3ec/jupyter_resource_usage/config.py#L40-L43
Although this does not seem to be used by the API handler at the moment since rss
is accessed directly here:
https://github.com/jupyter-server/jupyter-resource-usage/blob/78d2e1f6f47bc469c983eb94e8085918e1a8e3ec/jupyter_resource_usage/api.py#L33
Maybe the API handler should then also read process_memory_metrics
from the config to know which metrics to report.
@nishikantparmariam let me know if you would like to take a stab at this and open a draft PR so we can have a look at it? Otherwise I'll have a look. Thanks!
FYI @nishikantparmariam the latest version of jupyter-resource-usage
now includes the former extension for kernel usage (originally developed in https://github.com/Quansight/jupyterlab-kernel-usage): https://github.com/jupyter-server/jupyter-resource-usage/releases/tag/v0.7.0
It was added in https://github.com/jupyter-server/jupyter-resource-usage/pull/164.
Which means that the values in the right panel should correspond to what ipykernel
returns: https://github.com/ipython/ipykernel/pull/948
Were you able to try the new version?
Were you able to try the new version?
@jtpio Sorry for the delayed response. Yes, I tried 0.7.0 and it looks mostly good. It solves a major blocker - Jupyterlab freezing when kernel is busy, but shows no details (https://github.com/jupyter-server/jupyter-resource-usage/pull/164#issuecomment-1362875085) for that kernel.
Which means that the values in the right panel should correspond to what ipykernel returns: https://github.com/ipython/ipykernel/pull/948
Yes, the values in right panel seems to use PSS since they come from ipykernel. Still the memory shown in bottom (this issue) is calculated by RSS which may create confusion. I think it should be ok, but cc: @mlucool for thoughts.
I still think PSS everywhere is correct (when possible)
Just opened https://github.com/jupyter-server/jupyter-resource-usage/pull/171