jupyter-resource-usage icon indicating copy to clipboard operation
jupyter-resource-usage copied to clipboard

Feature to change default metrics for memory calculation from RSS to PSS

Open nishikantparmariam opened this issue 2 years ago • 1 comments

Problem

jupyter-resource-usage exaggerates usage probably due to copy-on-write. Shifting the methodology from RSS to PSS can help solve the problem. See the below reproducer -

import multiprocessing
import time
import psutil

def foo():
    time.sleep(10000)

parents_original_memory = bytearray(int(1e9)) # 1GB

for i in range(10):
    multiprocessing.Process(target=foo).start()

def get_memory_info(type):
    process_metric_value = lambda process: getattr(process.memory_full_info(), type)
    current_process = psutil.Process()
    all_processes = [current_process] + current_process.children(recursive=True)
    return (
        f"{sum([process_metric_value(process) for process in all_processes]) / 1e9} GB"
    )

print("RSS: ", get_memory_info("rss"))
print("PSS: ", get_memory_info("pss"))

Output is -

RSS: 11.590012928 GB 
PSS: 1.082778624 GB

jupyter-resource-usage reports RSS in JupyterLab, but PSS seems more accurate.

Proposed Solution

  • Changing RSS to PSS. But, as per this comment, PSS is a linux-only feature.
  • Another solution can be - doing memory calculation via both methods - RSS & PSS (if supported by OS) and let users easily override / change the default metric - through some configuration or by introducing a settings page for the extension.

Additional context

Found similar issue - issues#16

nishikantparmariam avatar May 23 '22 09:05 nishikantparmariam

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively. welcome You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

welcome[bot] avatar May 23 '22 09:05 welcome[bot]

Thanks @nishikantparmariam for the suggestion :+1:

Changing RSS to PSS. But, as per this comment, PSS is a linux-only feature.

Looks like a similar change landed in ipykernel: https://github.com/ipython/ipykernel/pull/948

Maybe it would make sense to return pss when available, and default to rss otherwise (like it is now)?

While keeping rss as the field in the response to avoid breaking other consumers of the API if possible:

https://github.com/jupyter-server/jupyter-resource-usage/blob/559710d106743160180abfefb065512fd1ac137f/jupyter_resource_usage/api.py#L48

jtpio avatar Nov 16 '22 16:11 jtpio

Maybe it would make sense to return pss when available, and default to rss otherwise (like it is now)? While keeping rss as the field in the response to avoid breaking other consumers of the API if possible:

This sounds good!

Just a clarification, I am guessing with the new change, by default the frontend will also use pss and if not available fallback to rss otherwise?

If not, we can make this configurable through settings by adding a boolean say use_pss_if_available (we can think of better name) which defaults to False (current behavior) and can be set to True.

nishikantparmariam avatar Nov 22 '22 19:11 nishikantparmariam

Just a clarification, I am guessing with the new change, by default the frontend will also use pss and if not available fallback to rss otherwise?

Right, and this would be handled on the server directly similar to https://github.com/ipython/ipykernel/pull/948.

jtpio avatar Nov 22 '22 20:11 jtpio

Actually after looking more closely at the code, there seems to be a way to configure which metrics to use already:

https://github.com/jupyter-server/jupyter-resource-usage/blob/78d2e1f6f47bc469c983eb94e8085918e1a8e3ec/jupyter_resource_usage/config.py#L40-L43

Although this does not seem to be used by the API handler at the moment since rss is accessed directly here:

https://github.com/jupyter-server/jupyter-resource-usage/blob/78d2e1f6f47bc469c983eb94e8085918e1a8e3ec/jupyter_resource_usage/api.py#L33

Maybe the API handler should then also read process_memory_metrics from the config to know which metrics to report.

jtpio avatar Nov 23 '22 08:11 jtpio

@nishikantparmariam let me know if you would like to take a stab at this and open a draft PR so we can have a look at it? Otherwise I'll have a look. Thanks!

jtpio avatar Nov 23 '22 08:11 jtpio

FYI @nishikantparmariam the latest version of jupyter-resource-usage now includes the former extension for kernel usage (originally developed in https://github.com/Quansight/jupyterlab-kernel-usage): https://github.com/jupyter-server/jupyter-resource-usage/releases/tag/v0.7.0

It was added in https://github.com/jupyter-server/jupyter-resource-usage/pull/164.

Which means that the values in the right panel should correspond to what ipykernel returns: https://github.com/ipython/ipykernel/pull/948

image

Were you able to try the new version?

jtpio avatar Jan 17 '23 17:01 jtpio

Were you able to try the new version?

@jtpio Sorry for the delayed response. Yes, I tried 0.7.0 and it looks mostly good. It solves a major blocker - Jupyterlab freezing when kernel is busy, but shows no details (https://github.com/jupyter-server/jupyter-resource-usage/pull/164#issuecomment-1362875085) for that kernel.

Which means that the values in the right panel should correspond to what ipykernel returns: https://github.com/ipython/ipykernel/pull/948

Yes, the values in right panel seems to use PSS since they come from ipykernel. Still the memory shown in bottom (this issue) is calculated by RSS which may create confusion. I think it should be ok, but cc: @mlucool for thoughts.

nishikantparmariam avatar Jan 23 '23 07:01 nishikantparmariam

I still think PSS everywhere is correct (when possible)

mlucool avatar Jan 23 '23 15:01 mlucool

Just opened https://github.com/jupyter-server/jupyter-resource-usage/pull/171

jtpio avatar Jan 23 '23 16:01 jtpio