serverinfo icon indicating copy to clipboard operation
serverinfo copied to clipboard

Add new mechanism to get thermal notification

Open oleua opened this issue 1 year ago • 3 comments

Basically the serverinfo gets the information from /sys/class/thermal/thermal_zone*/temp. At the same time some AMD motherboards and their chipsets do not store the information there, but in hwmon. Eg, I have hp microserver, and it grabs and stores the temperature data :

k10temp:
temp1 /sys/devices/pci0000:00/0000:00:18.3/hwmon/hwmon3/temp1_input

w83795adg-i2c-1-2f:
temp1 /sys/devices/pci0000:00/0000:00:14.0/i2c-1/1-002f/temp1_input
temp2 /sys/devices/pci0000:00/0000:00:14.0/i2c-1/1-002f/temp2_input
temp5 /sys/devices/pci0000:00/0000:00:14.0/i2c-1/1-002f/temp5_input

jc42-i2c-0-18
temp1 /sys/devices/pci0000:00/0000:00:14.0/i2c-0/0-0018/hwmon/hwmon0/temp1_input

jc42-i2c-0-19
temp1 /sys/devices/pci0000:00/0000:00:14.0/i2c-0/0-0019/hwmon/hwmon0/temp1_input 

And

# find /sys -name "temp*_input"
/sys/devices/pci0000:00/0000:00:18.3/hwmon/hwmon3/temp1_input
/sys/devices/pci0000:00/0000:00:14.0/i2c-1/1-002f/temp1_input
/sys/devices/pci0000:00/0000:00:14.0/i2c-1/1-002f/temp5_input
/sys/devices/pci0000:00/0000:00:14.0/i2c-1/1-002f/temp2_input
/sys/devices/pci0000:00/0000:00:14.0/i2c-0/0-0019/hwmon/hwmon1/temp1_input
/sys/devices/pci0000:00/0000:00:14.0/i2c-0/0-0018/hwmon/hwmon0/temp1_input

As well as, lm-sensors produces good data.

Is that possible to grab the data in a more universal way, eg from hwmon class, but not from the thermal_zone class?

Read more here https://github.com/Mellanox/mlxsw/wiki/Temperature-and-Fan-Control

oleua avatar Jan 10 '24 10:01 oleua

UPD: My quick dirty hack in the function getThermalZones() in lib/OperatingSystems/DefaultOs.php, which analyses presence of thermal zones. If not getting data from hwmon.

public function getThermalZones(): array {
                if(is_dir("/sys/class/thermal/thermal_zone*")) {
                    $thermalZones = glob('/sys/class/thermal/thermal_zone*') ?: [];
                    $result = [];
                foreach ($thermalZones as $thermalZone) {
                        $tzone = [];
                        try {
                                $tzone['hash'] = md5($thermalZone);
                                $tzone['type'] = $this->readContent($thermalZone . '/type');
                                $tzone['temp'] = (float)((int)($this->readContent($thermalZone . '/temp')) / 1000);
                                if ($tzone['temp'] > 0) { $tzone['temp'] = '+'.$tzone['temp']; }
                        } catch (RuntimeException $e) {
                                continue;
                        }
                        $result[] = $tzone;
                    }
                } else {
                    $thermalZones = glob('/sys/class/hwmon/hwmon*') ?: [];
                    $result = [];
                    foreach ($thermalZones as $thermalZone) {
                        $tzone = [];
                        try {
                                $tzone['hash'] = md5($thermalZone);
                                $tzone['type'] = $this->readContent($thermalZone . '/name');
                                $tzone['temp'] = (float)((int)($this->readContent($thermalZone . '/temp1_input')) / 1000);
                        } catch (RuntimeException $e) {
                                continue;
                        }
                        $result[] = $tzone;
                    }
                }
                return $result;
        }

The data are not so comfortable to interpret:

image

sensors gives the following data:

jc42-i2c-0-18
Adapter: SMBus PIIX4 adapter port 0 at 0b00
RAM1 Temp:    +13.75°C  (low  =  +0.0°C)
                       (high = +60.0°C, hyst = +54.0°C)
                       (crit = +70.0°C, hyst = +64.0°C)

jc42-i2c-0-19
Adapter: SMBus PIIX4 adapter port 0 at 0b00
RAM2 Temp:    +13.5°C  (low  =  +0.0°C)
                       (high = +60.0°C, hyst = +54.0°C)
                       (crit = +70.0°C, hyst = +64.0°C)

k10temp-pci-00c3
Adapter: PCI adapter
CPU Core Temp:  +24.75°C  (high = +70.0°C)
                         (crit = +100.0°C, hyst = +95.0°C)

And this data are completely missing:

w83795adg-i2c-1-2f
CPU Temp:     +26.0°C  (high = +109.0°C, hyst = +109.0°C)
                       (crit = +109.0°C, hyst = +109.0°C)  sensor = thermal diode
NB Temp:      +29.0°C  (high = +105.0°C, hyst = +105.0°C)
                       (crit = +105.0°C, hyst = +105.0°C)  sensor = thermal diode
MB Temp:       +4.5°C  (high = +39.0°C, hyst = +39.0°C)
                       (crit = +44.0°C, hyst = +44.0°C)  sensor = thermistor

oleua avatar Jan 10 '24 12:01 oleua

Hey,

Using /sys/class/hwmon/hwmon looks okay to me.

The data are not so comfortable to interpret:

A device in /sys/class/hwmon/hwmon1 is a "driver" and can have many sensors.

I guess you want something like below to read all sensors.

Index: lib/OperatingSystems/Linux.php
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/lib/OperatingSystems/Linux.php b/lib/OperatingSystems/Linux.php
--- a/lib/OperatingSystems/Linux.php	(revision 268a3601683d8d1d0605ba3d1c17b44afab007e2)
+++ b/lib/OperatingSystems/Linux.php	(date 1704898019944)
@@ -232,6 +232,18 @@
 	public function getThermalZones(): array {
 		$data = [];
 
+		$drivers = glob('/sys/class/hwmon/hwmon*');
+		foreach ($drivers as $driver) {
+			$name = $this->readContent($driver . '/name');
+
+			$zones = glob($driver . '/temp*_label');
+			foreach ($zones as $zone) {
+				$type = $name . ' ' . $this->readContent($zone);
+				$temp = (int)$this->readContent(str_replace('_label', '_input', $zone)) / 1000;
+				$data[] = new ThermalZone(md5($zone), $type, $temp);
+			}
+		}
+
 		$zones = glob('/sys/class/thermal/thermal_zone*');
 		if ($zones === false) {
 			return $data;


image

kesselb avatar Jan 10 '24 14:01 kesselb

$data = [];
 
+		$drivers = glob('/sys/class/hwmon/hwmon*');
+		foreach ($drivers as $driver) {
+			$name = $this->readContent($driver . '/name');
+
+			$zones = glob($driver . '/temp*_label');
+			foreach ($zones as $zone) {
+				$type = $name . ' ' . $this->readContent($zone);
+				$temp = (int)$this->readContent(str_replace('_label', '_input', $zone)) / 1000;
+				$data[] = new ThermalZone(md5($zone), $type, $temp);
+			}
+		}

Oh, that is nice!

@kesselb Daniel, may I ask you to post here the whole text of the patched function getThermalZones for NC v27?

oleua avatar Jan 10 '24 17:01 oleua