telegraf
telegraf copied to clipboard
sensors plugin reports higher different temperature than lm-sensors
Relevant telegraf.conf
# Monitor sensors, requires lm-sensors package
[[inputs.sensors]]
## Remove numbers from field names.
## If true, a field name like 'temp1_input' will be changed to 'temp_input'.
# remove_numbers = true
## Timeout is the maximum amount of time that the sensors command can run.
# timeout = "5s"
Logs from Telegraf
$ telegraf --config /etc/telegraf/telegraf.conf --input-filter sensors --test --debug
2022-08-29T15:28:53Z I! Starting Telegraf 1.21.4+ds1-0ubuntu2
2022-08-29T15:28:53Z I! Loaded inputs: sensors
2022-08-29T15:28:53Z I! Loaded aggregators:
2022-08-29T15:28:53Z I! Loaded processors:
2022-08-29T15:28:53Z W! Outputs are not used in testing mode!
2022-08-29T15:28:53Z I! Tags enabled: host=roger-nuc
2022-08-29T15:28:53Z D! [agent] Initializing plugins
2022-08-29T15:28:53Z D! [agent] Starting service inputs
2022-08-29T15:28:53Z D! [agent] Stopping service inputs
2022-08-29T15:28:53Z D! [agent] Input channel closed
> sensors,chip=coretemp-isa-0000,feature=package_id_0,host=roger-nuc temp_crit=100,temp_crit_alarm=0,temp_input=49,temp_max=100 1661786934000000000
2022-08-29T15:28:53Z D! [agent] Stopped Successfully
> sensors,chip=coretemp-isa-0000,feature=core_0,host=roger-nuc temp_crit=100,temp_crit_alarm=0,temp_input=41,temp_max=100 1661786934000000000
> sensors,chip=coretemp-isa-0000,feature=core_1,host=roger-nuc temp_crit=100,temp_crit_alarm=0,temp_input=49,temp_max=100 1661786934000000000
> sensors,chip=pch_skylake-virtual-0,feature=temp1,host=roger-nuc temp_input=38.5 1661786934000000000
> sensors,chip=acpitz-acpi-0,feature=temp1,host=roger-nuc temp_input=-263.2 1661786934000000000
> sensors,chip=nvme-pci-3c00,feature=composite,host=roger-nuc temp_alarm=0,temp_crit=84.85,temp_input=44.85,temp_max=79.85,temp_min=-5.15 1661786934000000000
System info
Telegraf 1.21.4+ds1-0ubuntu2, Ubuntu 22.04
Docker
No response
Steps to reproduce
- Default installation with
apt install telegraf
from default repositories. - Enable the sensors plugin in
/etc/telegraf/telegraf.conf
- Run
telegraf --config /etc/telegraf/telegraf.conf --input-filter sensors --test --debug
, note that thesensors,chip=coretemp-isa-0000,feature=package_id_0
hastemp_input=49
. - Immediate run
sensors
. Note that the output shows 38C. This is 11 degrees cooler. This only affects the CPU temps. The other values agree.
Expected behavior
lm-sensors and the plugin should agree on the CPU temperature.
Actual behavior
telegraf reports a CPU temperature approximately 10 degrees warmer than lm-sensors.
Additional info
No response
Note that this apparently only happens with --test
; if I enable file output and then tail -f /tmp/metrics.out
, the values are reported correctly.
Hi @rlipscombe. Telegraf's sensors input plugin is very simple. It only runs the sensors
program and scrapes its output. The plugin does no math on the values it scrapes. It's very unlikely that telegraf modified the temperature that it got from sensors
.
It may be that you are running into a bug or quirk of the sensors
program itself or the hardware it reads the tempurature from. I just tried to reproduce what you saw but I ran sensors
from the cli first. For one temperature sensor it returned 50C, then when I ran telegraf it was 46C. Then I ran sensors
and got 46C. Any combination of order of the two programs from then on returned 46C. I wonder if sensors
itself returns inaccurate values in some cases the first time it's called.
I also replaced sensors
with a script that returns the same format as sensors
but always returns the same values. Whether I ran it directly or telegraf ran it and scraped it, the values were always the same.
If you can share a reproducible case where the plugin is broken I'd be happy to check it out. Otherwise I don't see any change that telegraf needs to make and we should close this issue.
I'm as puzzled as you, because I already assumed it simply scraped sensors
, and yet the repro is exactly as detailed in the original report.
I did it again just now: I can run sensors
multiple times, and it reports ~35C, then I immediately run telegraf --test
and it reports ~46C.
Somewhat weirdly: if I run them both in split-pane tmux using watch, then the numbers agree. But the moment I run them one after the other, the broken behaviour's back. I don't believe that the CPU temp can change by 10C in under a second, so 😕.
fwiw, here's the output of sensors -A -u
(per https://github.com/influxdata/telegraf/blob/v1.24.0/plugins/inputs/sensors/sensors.go#L81):
$ sensors -A -u
coretemp-isa-0000
Package id 0:
temp1_input: 35.000
temp1_max: 100.000
temp1_crit: 100.000
temp1_crit_alarm: 0.000
Core 0:
temp2_input: 34.000
temp2_max: 100.000
temp2_crit: 100.000
temp2_crit_alarm: 0.000
Core 1:
temp3_input: 34.000
temp3_max: 100.000
temp3_crit: 100.000
temp3_crit_alarm: 0.000
pch_skylake-virtual-0
temp1:
temp1_input: 35.500
acpitz-acpi-0
temp1:
temp1_input: -263.200
iwlwifi_1-virtual-0
temp1:
ERROR: Can't get value of subfeature temp1_input: Can't read
nvme-pci-3c00
Composite:
temp1_input: 44.850
temp1_max: 79.850
temp1_min: -5.150
temp1_crit: 84.850
temp1_alarm: 0.000
(that ERROR
goes to stderr)
@rlipscombe If you're able to reproduce it every time, it must not be the same thing as the four degrees difference I saw one time. Maybe it's something unique to your platform. If so, I'm not going to be able to reproduce it.
I made a PR for you to help us understand what is going on. The PR changes telegraf so it saves the output of sensors
to a file before scraping it.
The PR build is available here https://github.com/influxdata/telegraf/pull/11808#issuecomment-1247047007 (the build will be automatically deleted after about a month)
Would you download and uncompress this build on your nuc, then run ./telegraf-1.25.0/usr/bin/telegraf --test --config /etc/telegraf/telegraf.conf --input-filter sensors 2>&1 | tee telegraf.txt
? That will save two files in the current directory, telegraf.txt and telegraf-sensors.txt. Then run sensors -A -u 2>&1 | tee sensors.txt
right afterward? That will save one file, sensors.txt.
We should be able to use those files to see whether telegraf is scraping the output incorrectly or if the output is really different when telegraf runs sensors compared to when you run it from the shell. Please attach all three files in a comment on this issue so I can look at them.
telegraf-sensors.txt sensors.txt telegraf.txt
Looks like it's parsing the output entirely correctly. The problem seems to be that the temperature does apparently jump by 12C when running telegraf
. That's really weird and suggests that (maybe) there's a bug in the chipset/firmware on this PC. Or maybe spinning up a go app raises the CPU freq which screws up the temperature readings. (shrug)
I'll keep digging, but it looks like it's not a bug in telegraf. Thanks for taking the time to look at it. Closing.