htop icon indicating copy to clipboard operation
htop copied to clipboard

linux: assign CPU temperatures by package/core or CCD

Open leahneukirchen opened this issue 1 year ago • 23 comments

Closes https://github.com/htop-dev/htop/issues/806. Closes https://github.com/htop-dev/htop/issues/1048. Closes https://github.com/htop-dev/htop/pull/1176. Closes https://github.com/htop-dev/htop/issues/1335. Addresses https://github.com/htop-dev/htop/issues/879 (on Linux).

leahneukirchen avatar Dec 25 '23 13:12 leahneukirchen

Fixed build without libsensors.

Fixed assignment on RPi.

leahneukirchen avatar Dec 25 '23 15:12 leahneukirchen

Note that the cpu are only parsed once, this takes <0.1ms on my machine.

leahneukirchen avatar Jan 01 '24 18:01 leahneukirchen

I think this is ready to go, should I squash it together sensibly?

leahneukirchen avatar Jan 16 '24 16:01 leahneukirchen

I think this is ready to go, should I squash it together sensibly?

As you like. Just took a quick peek at the list of commits and some suggest that they mostly rectify things changed in previous commits of the PR: Those are the prime subjects to squash in the PR right now. So there's not really much to squash together right now AFAICS.

BenBE avatar Jan 17 '24 00:01 BenBE

@leahneukirchen Anything left to do from your side? If not please mark the PR ready for review. TIA.

BenBE avatar Feb 02 '24 14:02 BenBE

@leahneukirchen Do you want to implement the three review comments from @cgzones ?

fasterit avatar Apr 17 '24 08:04 fasterit

I somehow dislike that the sensors are iterated twice: once in LibSensors_countCCDs() and once in LibSensors_getCPUTemperatures(). Do we need to compute the number of ccds beforehand, what about adjusting the CPU data after iterating all sensors (and counting all CCD ones) in LibSensors_getCPUTemperatures() ?

cgzones avatar Apr 18 '24 16:04 cgzones

I somehow dislike that the sensors are iterated twice: once in LibSensors_countCCDs() and once in LibSensors_getCPUTemperatures(). Do we need to compute the number of ccds beforehand, what about adjusting the CPU data after iterating all sensors (and counting all CCD ones) in LibSensors_getCPUTemperatures() ?

Seconded, libsensors is slow

fasterit avatar Apr 22 '24 19:04 fasterit

Tested on my notebook with Alder Lake CPU... It shows temperatures for all cores at least.

eworm-de avatar May 23 '24 08:05 eworm-de

Tested on an AMD EPYC 7F52 (16 core, 32 threads, 16x1 CCX), seems to work. At least plausible looking temperatures are now shown for all cores.

ulope avatar May 29 '24 14:05 ulope

AFAICS this is waiting for some refactoring work, but that work is yet not fully specified/explained. @cgzones can you please elaborate the open discussion threads?.

BenBE avatar Jun 04 '24 17:06 BenBE

Note that sensors are iterated twice on first start only, later only to read the temperatures. (Patches welcome, I find it hard to rewrite this to use one pass.)

leahneukirchen avatar Jun 05 '24 16:06 leahneukirchen

If this is a one-time event it sounds like a reasonable tradeoff, compared to (probably) more complex code. I have not looked at it myself, though.

eworm-de avatar Jun 06 '24 06:06 eworm-de

Oh, this crashes with a segmentation fault if libsensors.so is not available...

eworm-de avatar Jun 26 '24 13:06 eworm-de

Sigh. With Kernel 6.9 there seems to be a bug in the logic now, my i7-1355U isn't mapped correctly anymore...

leahneukirchen avatar Jun 26 '24 14:06 leahneukirchen

With kernel 6.9.5 I get:

% pcat /sys/class/hwmon/hwmon2/*label
/sys/class/hwmon/hwmon2/temp1_label	Package id 0
/sys/class/hwmon/hwmon2/temp2_label	Core 0
/sys/class/hwmon/hwmon2/temp6_label	Core 4
/sys/class/hwmon/hwmon2/temp10_label	Core 8
/sys/class/hwmon/hwmon2/temp11_label	Core 9
/sys/class/hwmon/hwmon2/temp12_label	Core 10
/sys/class/hwmon/hwmon2/temp13_label	Core 11
/sys/class/hwmon/hwmon2/temp14_label	Core 12
/sys/class/hwmon/hwmon2/temp15_label	Core 13
/sys/class/hwmon/hwmon2/temp16_label	Core 14
/sys/class/hwmon/hwmon2/temp17_label	Core 15

Note that the tempID are not sequential.

leahneukirchen avatar Jun 26 '24 15:06 leahneukirchen

Thanks a lot for the quick fix!

eworm-de avatar Jun 26 '24 16:06 eworm-de

On my laptop with a Ryzen 5500U, there's only a single Tctl measurement for the CPU:

k10temp-pci-00c3
Adapter: PCI adapter
Tctl:         +47.4°C

According to the kernel doc's quote of the AMD manual:

It does not represent an actual physical temperature like die or case temperature.

But it appears somewhat representative, at least here, so it might make sense to include it.

Vogtinator avatar Jun 28 '24 14:06 Vogtinator

I just checked on a server with AMD EPYC 7502, which supposedly (no idea whether it's correct!) has 2 sockets x 4 CCD x 2 CCX x 4 cores = 64 cores. The sockets are exposed as separate k10temp instances:

k10temp-pci-00c3
Adapter: PCI adapter
Tctl:         +38.4°C  
Tccd1:        +35.0°C  
Tccd3:        +36.8°C  
Tccd5:        +35.8°C  
Tccd7:        +35.8°C

k10temp-pci-00cb
Adapter: PCI adapter
Tctl:         +39.8°C  
Tccd1:        +38.0°C  
Tccd3:        +39.0°C  
Tccd5:        +37.5°C  
Tccd7:        +40.0°C

Vogtinator avatar Jun 28 '24 15:06 Vogtinator

Just having one sensor works already, no?

leahneukirchen avatar Jun 28 '24 15:06 leahneukirchen

Just having one sensor works already, no?

Depends on what you mean by "works". It shows a temperature value for each core, but I have no idea whether it matches. FWICT both sensors and its CCDs are read in whatever order sensors provides, which might or might not be related to the CPU core numbers...

Vogtinator avatar Jun 28 '24 15:06 Vogtinator

I tested on a AMD EPYC 7443 that when i lock processes to some cores, only these heat up.

leahneukirchen avatar Jun 28 '24 15:06 leahneukirchen