htop
htop copied to clipboard
Exclude processes running in a container (i.e. Docker)
I'm using htop on one of my servers also running various docker containers. In that case, htop can be very confusing due to the high amount of different processes.
It would be really helpful if there would be a switch to only show processes not running in a container (i.e. running directly on the host).
Not ideal, but you can display the CCGROUP column which should provide the container name. In Tree View you can then collapse all processes below the init process of each container.
Hi. I have never contributed to htop before. I would like to start with this issue.
To tackle this issue, one approach that I have thought of would be to check if /docker
is part of the cgroup-path
in the /proc/[pid]/cgroups
file.
I have started going through the source code. Based on what I have understood so far, I think ProcessList
has got to do with keeping a track of information/data that is not specific to any one process and also stores the list of processes to be displayed while Vector* processes
is a vector of all the processes themselves.
Also the implementation of ProcessList_new
is platform-specific.
Please correct me if I am wrong. I think the function of interest here is ProcessList_scan
or a related function. It looks like this is where process-specific information is collected.
Also, does adding a new switch mean adding a flag that can be used as htop --new-flag
or does it mean adding a hot-key similar to the function keys that can be used once htop starts running.
"Adding a new switch" refers to having a new setting in the ProcessList->settings
structure added, which in turn needs to be persisted and restored as part of the overall program configuration in Settings.c
.
Also not all containers use docker. I for example use many containers build with libvirt using either LXC or Xen for isolation. But other options like systemd nspawn (compressed cgroup abbreviation "SNC" for "systemd nspawn container") are in wide use and many more exist.
Apart from this your grasp of the situation is correct …
FWIW: The implementation of ProcessList_scan
is platform-specific too.
"Adding a new switch" refers to having a new setting in the
ProcessList->settings
structure added, which in turn needs to be persisted and restored as part of the overall program configuration inSettings.c
.
Alright, got it. Thanks.
FWIW: The implementation of
ProcessList_scan
is platform-specific too.
Is this platform-specific because it depends on other platform-specific functions such as ProcessList_goThroughEntries
or is there another reason to identify it as such? For eg, ProcessList_goThroughEntries
has been defined in several different platform-specific files such as FreeBSDProcessList.c
. So I know this is platform-specific.
FWIW: The implementation of
ProcessList_scan
is platform-specific too.Is this platform-specific because it depends on other platform-specific functions such as
ProcessList_goThroughEntries
or is there another reason to identify it as such? For eg,ProcessList_goThroughEntries
has been defined in several different platform-specific files such asFreeBSDProcessList.c
. So I know this is platform-specific.
The ProcessList_scan
function itself is only defined once, but it heavily depends on other functions such as ProcessList_goThroughEntries
to gather all the information you need. In the case of containerizing processes you'll likely have to look into the platform-specific implementations as e.g. both CGROUP
(Linux) and JAIL
(*BSD) depend on the platform, but either can indicate that this process is running inside a (restricted) container.
FWIW: The implementation of
ProcessList_scan
is platform-specific too.Is this platform-specific because it depends on other platform-specific functions such as
ProcessList_goThroughEntries
or is there another reason to identify it as such? For eg,ProcessList_goThroughEntries
has been defined in several different platform-specific files such asFreeBSDProcessList.c
. So I know this is platform-specific.The
ProcessList_scan
function itself is only defined once, but it heavily depends on other functions such asProcessList_goThroughEntries
to gather all the information you need. In the case of containerizing processes you'll likely have to look into the platform-specific implementations as e.g. bothCGROUP
(Linux) andJAIL
(*BSD) depend on the platform, but either can indicate that this process is running inside a (restricted) container.
Alright. I'll investigate further before I actually start implementing. Thanks.
Hi. I read up on containers, namespaces and cgroups and learnt that in the /proc/[pid]/status
file, there are a few fields (such as nstgid
and nspid
that might help in identifying processes running in a container.
According to the proc(5)
man page:
NStgid: Thread group ID (i.e., PID) in each of the PID namespaces of which [pid] is a member. The leftmost entry shows the value with respect to the PID namespace of the process that mounted this procfs (or the root namespace if mounted by the kernel), followed by the value in successively nested inner namespaces.
NSpid: Thread ID in each of the PID namespaces of which [pid] is a member. The fields are ordered as for NStgid.
I did some tinkering on my computer and found that for processes that run directly on the host, there is only one number which is associated with the host's namespace. For processes that I ran in a docker container, they had two numbers in said fields, one associated with the host's namespace and the other associated with the child namespace.
lsns -t pid
outputs a concise list of the namespaces present on the host.
Can htop use this information to keep a track of all containerized processes?
Hi. I am sorry for the long period of silence. I have already started working on the pid namespace idea. However, I had a few questions before continuing with the implementation.
My idea is to basically mark and hide all the processes running in a pid namespace that is different from the host's init process' namespace. I believe that is how containers work under the hood. However, there might be other processes as well running in a separate pid namespace that wasn't spawned in a container. (On my computer, for example, "brave" runs in a separate namespace. In this case, "brave" will get hidden too.)
The other idea is to hard code the possible cgroup names (eg. /docker/) in a hashmap, and then compare every pid row in /proc/[pid]/cgroup
with the values in the hashmap. However, this would involve maintaining a list of every possible cgroup name that is valid for every system container, and this list might have to be changed repeatedly in the future.
I was wondering in which direction I should continue. Or is there a third way that might be better suited to identifying processes running in a container?
Don't rely on the cgroup naming, as these have changed over time and there are at least 4 schemes I'm aware of on Linux (2 for LXC, 1 for NSpawn, another for Docker) for containers; for other platforms there are more. Other software may even use further schemes. Thus if at all you would need to have such a function be extensible (and fast) to filter by the cgroup name scheme (and be quite robust against noise). There's some code in CGroupUtils for parsing/shortening cgroup names and even just filtering for these patterns inevitably becomes a mess.
On that note: Can you provide me with an example of a full cgroup name for a process running inside docker (as seen by the host)? Is there any indication of the docker container that is worth to be kept when shortening?
But back to your issue: If you really want to filter for containers you'd have to find a metric to determine if something is running inside a container. Note though that htop itself might be running inside a container itself as asked for in e.g. #162 …
Also: What about a (platform independent) column for virtualization giving the type of virtualization and the name/ID of the container, if necessary with nesting? Thus running NSpawn inside an LXC might yield L:foo N:bar
, yet a process running on the host directly might simply give -
.
On that note: Can you provide me with an example of a full cgroup name for a process running inside docker (as seen by the host)?
I am running the getting-started
image that I got from the tutorial on docker's website. Here's the cgroup output as seen by the host:
12:rdma:/docker/ff90ed5efd657539ce42198e0d61661d62f2a5d8de9439e2aeb046fb1ef4e0af
11:pids:/docker/ff90ed5efd657539ce42198e0d61661d62f2a5d8de9439e2aeb046fb1ef4e0af
10:blkio:/docker/ff90ed5efd657539ce42198e0d61661d62f2a5d8de9439e2aeb046fb1ef4e0af
9:memory:/docker/ff90ed5efd657539ce42198e0d61661d62f2a5d8de9439e2aeb046fb1ef4e0af
8:perf_event:/docker/ff90ed5efd657539ce42198e0d61661d62f2a5d8de9439e2aeb046fb1ef4e0af
7:cpuset:/docker/ff90ed5efd657539ce42198e0d61661d62f2a5d8de9439e2aeb046fb1ef4e0af
6:hugetlb:/docker/ff90ed5efd657539ce42198e0d61661d62f2a5d8de9439e2aeb046fb1ef4e0af
5:devices:/docker/ff90ed5efd657539ce42198e0d61661d62f2a5d8de9439e2aeb046fb1ef4e0af
4:freezer:/docker/ff90ed5efd657539ce42198e0d61661d62f2a5d8de9439e2aeb046fb1ef4e0af
3:cpu,cpuacct:/docker/ff90ed5efd657539ce42198e0d61661d62f2a5d8de9439e2aeb046fb1ef4e0af
2:net_cls,net_prio:/docker/ff90ed5efd657539ce42198e0d61661d62f2a5d8de9439e2aeb046fb1ef4e0af
1:name=systemd:/docker/ff90ed5efd657539ce42198e0d61661d62f2a5d8de9439e2aeb046fb1ef4e0af
0::/docker/ff90ed5efd657539ce42198e0d61661d62f2a5d8de9439e2aeb046fb1ef4e0af
The ff90ed5efd65
part is the container id.
Here's some info related to my host machine:
Linux 5.4.0-122-generic x86_64 GNU/Linux
OS version: Ubuntu 20.04.4 LTS
Is there any indication of the docker container that is worth to be kept when shortening?
I am sorry, I haven't really understood the question. I don't think the cgroup name can be shortened, since the /docker/
part gives us the container application's name and the rest gives us the container id.
If you really want to filter for containers you'd have to find a metric to determine if something is running inside a container.
I was thinking of using metrics such as NSpid
, NSpgid
, NStgid
and/or NSsid
. That's based on what I have understood here; that containers create a new pid namespace which is then used to run containerised processes. So these containerised processes are not impacted by other processes outside the container but can still be viewed outside the container.
The issue is that there might be other processes that also run in a separate pid namespace. Or we might create our own namespace using unshare
to isolate certain processes. In that case, these processes will also have multiple NS(p/s/tg/pg)id
values. So I was wondering whether there's a way to differentiate, for example, a docker process from our own containerised process.
Also: What about a (platform independent) column for virtualization giving the type of virtualization and the name/ID of the container, if necessary with nesting? Thus running NSpawn inside an LXC might yield
L:foo N:bar
, yet a process running on the host directly might simply give-
.
It would be nice to have this. This could probably be possible by parsing, for example, the NSpgid
line in /proc/pid/status
. The name of the init process in each nested namespace might help in this task. Or maybe the output of readlink proc/pid/ns/pid
might help.
I must confess that I am still trying to wrap my head around these concepts.
On that note: Can you provide me with an example of a full cgroup name for a process running inside docker (as seen by the host)?
I am running the
getting-started
image that I got from the tutorial on docker's website. Here's the cgroup output as seen by the host:12:rdma:/docker/ff90ed5efd657539ce42198e0d61661d62f2a5d8de9439e2aeb046fb1ef4e0af 11:pids:/docker/ff90ed5efd657539ce42198e0d61661d62f2a5d8de9439e2aeb046fb1ef4e0af 10:blkio:/docker/ff90ed5efd657539ce42198e0d61661d62f2a5d8de9439e2aeb046fb1ef4e0af 9:memory:/docker/ff90ed5efd657539ce42198e0d61661d62f2a5d8de9439e2aeb046fb1ef4e0af 8:perf_event:/docker/ff90ed5efd657539ce42198e0d61661d62f2a5d8de9439e2aeb046fb1ef4e0af 7:cpuset:/docker/ff90ed5efd657539ce42198e0d61661d62f2a5d8de9439e2aeb046fb1ef4e0af 6:hugetlb:/docker/ff90ed5efd657539ce42198e0d61661d62f2a5d8de9439e2aeb046fb1ef4e0af 5:devices:/docker/ff90ed5efd657539ce42198e0d61661d62f2a5d8de9439e2aeb046fb1ef4e0af 4:freezer:/docker/ff90ed5efd657539ce42198e0d61661d62f2a5d8de9439e2aeb046fb1ef4e0af 3:cpu,cpuacct:/docker/ff90ed5efd657539ce42198e0d61661d62f2a5d8de9439e2aeb046fb1ef4e0af 2:net_cls,net_prio:/docker/ff90ed5efd657539ce42198e0d61661d62f2a5d8de9439e2aeb046fb1ef4e0af 1:name=systemd:/docker/ff90ed5efd657539ce42198e0d61661d62f2a5d8de9439e2aeb046fb1ef4e0af 0::/docker/ff90ed5efd657539ce42198e0d61661d62f2a5d8de9439e2aeb046fb1ef4e0af
The
ff90ed5efd65
part is the container id.
Okay, seems easy enough … Background for the question is the mentioned implementation in linux/CGroupUtils.c
.
Is there any indication of the docker container that is worth to be kept when shortening?
I am sorry, I haven't really understood the question. I don't think the cgroup name can be shortened, since the
/docker/
part gives us the container application's name and the rest gives us the container id.
Based on your above answer I deduced, keeping (part of) the container ID (similar to what git does for commit hashes) should be worthwhile.
Thus the CCGROUP
column (shortened version of CGROUP
may read /[D:ff90ed5e]/foo
for this container.
Based on your above answer I deduced, keeping (part of) the container ID (similar to what git does for commit hashes) should be worthwhile.
Oh right, I didn't think of that. Currently, htop doesn't shorten the /docker/
name in CCGROUP
. I think I will submit a PR that does that.
Based on the CCGROUP
filtering (in CGroupUtils.c
) it should be easy to build the Linux version of the CONTAINER
column (just drop every label that's not a container indicator). For all the *BSDs that support jails a first version using J:<jailname>
should suffice.
If you want to take this up, this would be a different PR though.
If you want to take this up, this would be a different PR though.
Yeah, I would like to take this up since this is somewhat related to containers as well.