lxd icon indicating copy to clipboard operation
lxd copied to clipboard

Offline/unplugged CPUs are showing in container metrics when using `cgroup1`

Open simondeziel opened this issue 1 year ago • 4 comments

With a cgroup1 VM with a single CPU (implied default limits.cpu=1), its guest instances are apparently seeing the other CPU cores that are "hotpuggable" in the VM:

sdeziel@sdeziel-lemur:~$ nproc
12
$ lxc exec v1 -- nproc
1
root@v1:~# nproc
1
root@v1:~# lxc query /1.0/metrics | grep ^lxd_cpu_seconds
lxd_cpu_seconds_total{cpu="0",mode="system",name="a1",project="default",type="container"} 0.150751482
lxd_cpu_seconds_total{cpu="0",mode="user",name="a1",project="default",type="container"} 0.043823309
lxd_cpu_seconds_total{cpu="2",mode="system",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="2",mode="user",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="3",mode="system",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="3",mode="user",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="4",mode="system",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="4",mode="user",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="7",mode="system",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="7",mode="user",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="10",mode="system",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="10",mode="user",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="1",mode="system",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="1",mode="user",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="5",mode="system",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="5",mode="user",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="6",mode="system",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="6",mode="user",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="8",mode="system",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="8",mode="user",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="9",mode="system",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="9",mode="user",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="11",mode="system",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="11",mode="user",name="a1",project="default",type="container"} 0

Here's how to reproduce:

lxc launch ubuntu-daily:22.04 --vm v1
lxc exec v1 -- sed -i 's/console=ttyS0"/console=ttyS0 systemd.unified_cgroup_hierarchy=0"/' /etc/default/grub.d/50-cloudimg-settings.cfg
lxc exec v1 -- update-grub
lxc restart v1
lxc exec v1 -- lxd init --auto
lxc exec v1 -- lxc launch ubuntu-minimal:22.04 c1
lxc exec v1 -- lxc query /1.0/metrics | grep ^lxd_cpu_seconds

The metrics query should only report about cpu="0" but it reports 0 for other CPU cores that are not online/plugged:

$ lxc exec v1 -- lxc query /1.0/metrics | grep ^lxd_cpu_seconds
lxd_cpu_seconds_total{cpu="7",mode="system",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="7",mode="user",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="8",mode="system",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="8",mode="user",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="9",mode="system",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="9",mode="user",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="11",mode="system",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="11",mode="user",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="2",mode="system",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="2",mode="user",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="4",mode="system",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="4",mode="user",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="6",mode="system",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="6",mode="user",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="5",mode="system",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="5",mode="user",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="10",mode="system",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="10",mode="user",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="0",mode="system",name="c1",project="default",type="container"} 1.510519089
lxd_cpu_seconds_total{cpu="0",mode="user",name="c1",project="default",type="container"} 3.571449891
lxd_cpu_seconds_total{cpu="1",mode="system",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="1",mode="user",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="3",mode="system",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="3",mode="user",name="c1",project="default",type="container"} 0

$ nproc
12
$ lxc exec v1 -- nproc
1

$ lxc exec v1 -- snap list lxd
Name  Version        Rev    Tracking      Publisher   Notes
lxd   5.0.3-babaaf8  27948  5.0/stable/…  canonical✓  -

FYI, this is reproducible with 5.0/stable, 5.21/stable and latest/edge.

simondeziel avatar Apr 12 '24 18:04 simondeziel

@mihalicyn is this expected?

@simondeziel how is the behavior different in cgroupv2?

tomponline avatar Apr 12 '24 18:04 tomponline

@simondeziel how is the behavior different in cgroupv2?

With cgroup2 (default with 22.04, maybe 20.04 too?) only cpu="0" is reported about which seems to be expected https://github.com/canonical/lxd/blob/main/lxd/cgroup/abstraction.go#L340-L341 and cpu="0" is always online.

simondeziel avatar Apr 12 '24 18:04 simondeziel

@simondeziel @mihalicyn please can you chat about this and figure out if we need to do anything here?

tomponline avatar Oct 15 '24 12:10 tomponline

https://lore.kernel.org/all/[email protected]

mihalicyn avatar Oct 17 '24 10:10 mihalicyn