Linux/PSI: Add cgroup variants for PSI meters
I would like to contribute new "cgroup variants" of the PSI meters for htop. For this I would need some quick feedback on the design:
Background:
The linux kernel has per cgroup variants (documentation) for pressure stall information. Those are of particular interest for containers (or services) running in a cgroup with reduced access to system resources.
For example, let's say I start an lxc container with a very small CPU quota of 10% of one core and issue some CPU intensive task. I then observe the following within the container:
# tail -n +1 /proc/pressure/cpu /sys/fs/cgroup/cpu.pressure
==> /proc/pressure/cpu <==
some avg10=10.78 avg60=2.27 avg300=1.08 total=435591433
full avg10=0.00 avg60=0.00 avg300=0.00 total=0
==> /sys/fs/cgroup/cpu.pressure <==
some avg10=14.26 avg60=3.23 avg300=1.38 total=16380126
full avg10=14.26 avg60=3.23 avg300=1.38 total=16372809
Here, arguably I do observe that something is amiss in the (system wide) /proc/pressure/cpu file. But much more relevant for the container is the information provided in /sys/fs/cgroup/cpu.pressure.
It would thus be nice to have a variant of the PSI meters in htop that can display PSI information from /sys/fs/cgroup/(cpu|io|memory|irq).pressure (see discussion below) instead of the one found at /proc/pressure/(cpu|io|memory|irq).
Questions:
Two solutions come to mind:
-
A natural candidate (for containers at least) would be to query
/sys/fs/cgroup/(cpu|io|memory|irq).pressure. This works as long as an lxc guest creates a new cgroup namespace instance and mounts/sys/fs/cgroupaccordingly. -
I think a more flexible and robust approach instead would to query for the current cgroup htop is located in and then use this to show all relevant information:
# cat /proc/self/cgroup 0::/user.slice/user-0.slice/[email protected]/app.slice/tmux.service # tail -n +1 /sys/fs/cgroup/user.slice/user-0.slice/[email protected]/app.slice/tmux.service/(cpu|io|memory|irq).pressure ==> /sys/fs/cgroup/user.slice/user-0.slice/[email protected]/app.slice/tmux.service/cpu.pressure <== some avg10=0.04 avg60=0.22 avg300=0.70 total=12782400 full avg10=0.04 avg60=0.22 avg300=0.70 total=12779692 ==> /sys/fs/cgroup/user.slice/user-0.slice/[email protected]/app.slice/tmux.service/io.pressure <== some avg10=0.00 avg60=0.00 avg300=0.00 total=19145 full avg10=0.00 avg60=0.00 avg300=0.00 total=19145 ==> /sys/fs/cgroup/user.slice/user-0.slice/[email protected]/app.slice/tmux.service/irq.pressure <== full avg10=0.00 avg60=0.00 avg300=0.00 total=76071 ==> /sys/fs/cgroup/user.slice/user-0.slice/[email protected]/app.slice/tmux.service/memory.pressure <== some avg10=0.00 avg60=0.00 avg300=0.00 total=0 full avg10=0.00 avg60=0.00 avg300=0.00 total=0I would prefer the second approach for its flexibility: htop running in some service unit/cgroup with resource restrictions will show relevant PSI information for that unit/cgroup.
-
Related question to the second approach: is there a programmatic "htop way" of figuring out what cgroup htop is running in and where the (cgroupv2) mountpoint is located?
I do observe that something is amiss in the (system wide) /proc/pressure/cpu file.
This is expected - its a backwards-compatibility line, the kernel no longer updates 'full' CPU PSI values system-wide, only 'some' values.
htop running in some service unit/cgroup with resource restrictions will show relevant PSI information for that unit/cgroup
I don't love it. :| There may be hundreds or thousands of active cgroups ... only being able to display values for the one cgroup that htop runs in is not a great user experience IMO. A better way of doing this, perhaps, would be to make cgroups a first-class citizens alongside processes, and then have optional Screens dedicated to displaying cgroups and their information (i.e. in separate tabs to processes).
Its a non-trivial change though - this is the direction @smalinux has been headed in #1102 using PCP (well, that's even more of a general solution, but you could seek ideas from there for a hard-coded Linux cgroups screen).
figuring out what cgroup htop is running in [...]
The htop cgroups code lives in linux/CGroupUtils.c so I'd start reading there - this functionality you seek is not implemented though, it's primarily concerned with formatting the name of the cgroup associated with a process.
The cgroup information on Linux is read as part of the process list refresh and stored alongside the other information for each process. You could internally just look up that process object via the PID of the current process. To do this you need access to htop's ProcessList structure.
Apart from this there's still just plain reading of /proc/self/cgroup available …
But as @natoscott mentioned, the PSI per CGroup are better suited for a dedicated screen/tab.