htop icon indicating copy to clipboard operation
htop copied to clipboard

CPU list must use available (selected) core numbers

Open sergey-dryabzhinsky opened this issue 3 years ago • 16 comments

Machine Info:

  • System: linux
  • OS: Debian 9+ / Ubuntu 14+
  • Virtualization: LXC
  • Server: 6 core, 12 thread, Proxmox 6.4
  • Container: LXC, 2 cores limited

How to reproduce:

  • use latest code from master (by 2022-05-20)
  • run lxc container with Ubuntu (any version with htop 3+),
  • run htop

What expected to see:

  • htop shows cpu meter with only 2 bars, like htop-2.* do
  • htop shows utilization and frequency or corresponded cores

What happens realy:

  • htop shows cpu meter with glitchin info, not accurate

Possible explanation:

As Proxmox 6.4 uses cgroups-v1, ans lxcfs not covers all sys files there is possible ways to mess things up. Inside container:

  1. /proc/cpuinfo shows right count of available cores: 2. This was htop-2 behaviour as I recall.
  2. /sys/devices/system/cpu/cpu*/online shows RIGHT count of SYSTEM cores available. But not SELECTED by hypervisor.
  3. /proc/1/status line Cpus_allowed_list shows SELECTED by hypervisor cores: Cpus_allowed_list: 7,10. -- This should be accounted.

On hypervisor:

  1. /proc/1/status line Cpus_allowed_list shows AVAILABLE cores: Cpus_allowed_list: 0-11.

Solution (linux)? - parse /proc/1/status for counting of cores available. Selection of cores may be precise: 7,10; or wide range: 0-1. Or even both: 0-1,7,10. So reading /sys/devices/system/cpu/cpu%u/cpufreq/scaling_cur_freq by selected cores will give more accurate information.

sergey-dryabzhinsky avatar May 20 '22 06:05 sergey-dryabzhinsky

And more: cgroups in kernels up to 3.19.3 may be affected: https://github.com/lxc/lxc/issues/427

sergey-dryabzhinsky avatar May 20 '22 06:05 sergey-dryabzhinsky

Duplicate of #993, solved in #995. Correct?

fasterit avatar May 20 '22 10:05 fasterit

Wait, I'll check.

sergey-dryabzhinsky avatar May 20 '22 10:05 sergey-dryabzhinsky

Yes! Looks good. Closing.

sergey-dryabzhinsky avatar May 20 '22 10:05 sergey-dryabzhinsky

Thou there is question.

Will htop show right CPU meters, freqs etc - if it not pointed to right available core numbers?

sergey-dryabzhinsky avatar May 20 '22 16:05 sergey-dryabzhinsky

I think as of scanning /sys/devices/system/cpu/cpu%u/cpufreq/scaling_cur_freq - there will be errorneus output.

sergey-dryabzhinsky avatar May 20 '22 16:05 sergey-dryabzhinsky

And still linux/lxc messes with cores info:

# grep 'core id' /proc/cpuinfo 
core id         : 2
core id         : 3
# grep Cpus_allowed_list /proc/1/status
Cpus_allowed_list:      2,9

sergey-dryabzhinsky avatar May 20 '22 17:05 sergey-dryabzhinsky

I suggest to use /proc/cpuinfo only if /proc/1/status has no cores information.

sergey-dryabzhinsky avatar May 20 '22 17:05 sergey-dryabzhinsky

I think that goes beyond the level of hackery I'd personally support to work around bad design decisions from lxc. I lean more towards disabling CPU temp and frequency support when inside such a container. /DLange

fasterit avatar May 20 '22 19:05 fasterit

Okay. May be I will make PR one day. Mark issue: future, help wanted.

sergey-dryabzhinsky avatar May 21 '22 02:05 sergey-dryabzhinsky

It would be good if somebody using lxc would test if the CPU meters reflect the workload running inside such a container.

fasterit avatar May 21 '22 15:05 fasterit

Well, it not as simple as I think.

# grep 'core id' /proc/cpuinfo 
core id		: 0
core id		: 1
core id		: 2
core id		: 3
# grep 'Cpus_all' /proc/1/status
Cpus_allowed:	fff
Cpus_allowed_list:	0-11

OpenVZ enabled only 4 cores, but allows all 12 to be selected. And there is empty /sys/devices/system/cpu/cpu*. Such a mess all these virt.systems.

sergey-dryabzhinsky avatar May 21 '22 18:05 sergey-dryabzhinsky

@fasterit Yes, It can't be helped. If we use container it became "detached" from HW. No matter if we pin container to selected cores - inside all messed up - /proc/cpuinfo, /sys/devices/system/cpu/cpu*. We can't be sure which real cores container uses.

So disabling features (temp, freq) is simpliest way.

sergey-dryabzhinsky avatar Jun 04 '22 13:06 sergey-dryabzhinsky

CPU meters reflect the workload

AFAIK cpu cores readed from /proc/cpuinfo will have correponding indexes in /proc/stat. So meters must reflect in-container workload.

sergey-dryabzhinsky avatar Jun 04 '22 13:06 sergey-dryabzhinsky