netdata-cloud icon indicating copy to clipboard operation
netdata-cloud copied to clipboard

[Bug]: temperatures are not shown anymore

Open Cris70 opened this issue 1 year ago • 6 comments

Bug description

Netdata is not showing temperatures for any of my server anymore. Previously (one or two months ago) there were multiple temperature reading for each server, now it is just empty.

Expected behavior

Temperatures should be shown in the "sensor temp" widget for each server.

Steps to reproduce

  1. n/a

Screenshots

Screenshot_20240715_233622

Error Logs

immagine

Desktop

OS: linux openSUSE Tumbleweed 20240714 Browser: Firefox Browser Version: 127.0.2

Additional context

Temperatures are correctly shown if I open the detailed view of one of the servers. Also, if I ssh into a server and issue the "sensors" command, temperatures are correctly reported.

Cris70 avatar Jul 15 '24 21:07 Cris70

@Cris70 : The temperature charts on the node tab is not what is presented by default by Netdata. If you have added them, can you please verify that you have added the correct metric that you want to display there?

Can you also check if your IPMI plugin is on and configured? This does not seem to be a bug but a configuration issue of sorts.

sashwathn avatar Jul 16 '24 13:07 sashwathn

Thank you @sashwathn for your reply. Yes, I confirm that I have added them a while ago. I thought it was a bug because the charts did work for a few months, and then they vanished. After reading your message I tried to review the setup, and found out that the name of the metric has changed.

Here is the original one: Screenshot_20240716_233625 ...and here is the current one (after looking up all the metrics starting with sensor): Screenshot_20240716_233819

Also, the previous chart showed a lot of temperature reading, while the current one only shows one (and it's also difficult to tell which one it is showing): Screenshot_20240717_000841

About the IPMI plugin: no, I do not have it installed, and I did not have it installed before (when the charts were working).

Cris70 avatar Jul 16 '24 22:07 Cris70

@ilyam8 @stelfrag : Can you please share the details on the change to this metric and how it should be visualised?

sashwathn avatar Jul 23 '24 08:07 sashwathn

These are the details: Sensors->Metrics. It is a chart per sensor now. The temperature chart in Nodes is aggregated (I don't know what grouping method was used (sum, avg, min or max)). Having multiple dimensions on this chart is impossible because this Edit Metric doesn't have "group by".

ilyam8 avatar Jul 23 '24 08:07 ilyam8

@ilyam8 thank you for your reply. Can this behavior be changed? I think the way it is now has not a lot of value: I understand that the temperature readings are aggregated, but it may have no sense to aggregate all the temperatures of the cpu cores, of the hard disks, of the mainboard, of some PCI cards, etc in just one measure. It should at least be possible to aggregate by hardware type, if not have the single readings like before.

Cris70 avatar Jul 29 '24 22:07 Cris70

It is possible to aggregate however you want in Metrics. @sashwathn this is a feature request (not a bug) for frontend.

ilyam8 avatar Jul 30 '24 06:07 ilyam8

Thank you @ilyam8 , I know that I can aggregate, but the point is: what can I aggregate? I mean, if I look at the temperatures presently available in Netdata, I see this: Screenshot_20240806_093709 As you can see, I have three metrics related to SMART devices (i.e. disks), and only one metric related to...? I suppose this is what is reported by the "sensors" software package (judging by the name). This last one is the one that I have added, and that you can see in the screenshot in my second message above. As you can see this appears as a single measurement, and one can not tell what it represents. Previously, when adding the temperature metric, I could see a lot of lines in the graph and, when hovering, a legend would appear showing all the CPU cores associated with each line. Now there's only one line and I do not know what it stands for. See for example what the "sensors" command reports on one of the servers:

root@pve:~# sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +62.0°C  (high = +75.0°C, crit = +93.0°C)
Core 0:        +57.0°C  (high = +75.0°C, crit = +93.0°C)
Core 1:        +58.0°C  (high = +75.0°C, crit = +93.0°C)
Core 2:        +56.0°C  (high = +75.0°C, crit = +93.0°C)
Core 3:        +59.0°C  (high = +75.0°C, crit = +93.0°C)
Core 4:        +58.0°C  (high = +75.0°C, crit = +93.0°C)
Core 5:        +57.0°C  (high = +75.0°C, crit = +93.0°C)
Core 6:        +57.0°C  (high = +75.0°C, crit = +93.0°C)
Core 8:        +55.0°C  (high = +75.0°C, crit = +93.0°C)
Core 9:        +57.0°C  (high = +75.0°C, crit = +93.0°C)
Core 10:       +55.0°C  (high = +75.0°C, crit = +93.0°C)
Core 11:       +57.0°C  (high = +75.0°C, crit = +93.0°C)
Core 12:       +55.0°C  (high = +75.0°C, crit = +93.0°C)
Core 13:       +58.0°C  (high = +75.0°C, crit = +93.0°C)
Core 14:       +55.0°C  (high = +75.0°C, crit = +93.0°C)

be2net-pci-0406
Adapter: PCI adapter
temp1:        +44.0°C  

be2net-pci-0404
Adapter: PCI adapter
temp1:        +44.0°C  

be2net-pci-0400
Adapter: PCI adapter
temp1:        +44.0°C  

coretemp-isa-0001
Adapter: ISA adapter
Package id 1:  +61.0°C  (high = +75.0°C, crit = +93.0°C)
Core 0:        +54.0°C  (high = +75.0°C, crit = +93.0°C)
Core 1:        +56.0°C  (high = +75.0°C, crit = +93.0°C)
Core 2:        +52.0°C  (high = +75.0°C, crit = +93.0°C)
Core 3:        +54.0°C  (high = +75.0°C, crit = +93.0°C)
Core 4:        +55.0°C  (high = +75.0°C, crit = +93.0°C)
Core 5:        +55.0°C  (high = +75.0°C, crit = +93.0°C)
Core 6:        +53.0°C  (high = +75.0°C, crit = +93.0°C)
Core 8:        +53.0°C  (high = +75.0°C, crit = +93.0°C)
Core 9:        +54.0°C  (high = +75.0°C, crit = +93.0°C)
Core 10:       +54.0°C  (high = +75.0°C, crit = +93.0°C)
Core 11:       +53.0°C  (high = +75.0°C, crit = +93.0°C)
Core 12:       +53.0°C  (high = +75.0°C, crit = +93.0°C)
Core 13:       +53.0°C  (high = +75.0°C, crit = +93.0°C)
Core 14:       +51.0°C  (high = +75.0°C, crit = +93.0°C)

be2net-pci-0407
Adapter: PCI adapter
temp1:        +44.0°C  

be2net-pci-0405
Adapter: PCI adapter
temp1:        +44.0°C  

be2net-pci-0401
Adapter: PCI adapter
temp1:        +44.0°C

You can see there are a lot of readings, but the Netdata graph does not give any context. As I see it, this is a regression with respect to the previous implementation, and that's why I called it a bug.

Also, if I add a custom metric, I do not expect it to completely disappear from one version of Netdata to the next just because you have renamed it. I would have expected to see some kind of automatic replacement from the old metric to the new one.

Cris70 avatar Aug 06 '24 08:08 Cris70

I know that I can aggregate, but the point is: what can I aggregate?

You can change grouping using "Group by" (Metrics tab). The chart is sensors.sensor_temperature

https://github.com/user-attachments/assets/c757e4a8-5349-45c2-9bf0-3ca69ed3a1ae

ilyam8 avatar Aug 06 '24 14:08 ilyam8

Thank you once again @ilyam8 ! This is the closest match to the previous implementation. Unfortunately I have been unable to make it stick: as soon as I change tab it reverts to the default. I have been unable to find anything in the docs or in the community on how to save the setting. AND it does not affect the metric in the Nodes tab (but this may be a consequence of my inability to save the new setup).

Cris70 avatar Aug 07 '24 07:08 Cris70

I think you can save your selections using

Screenshot 2024-08-07 at 10 49 28

it does not affect the metric in the Nodes tab

Yes, it has nothing to do with the Nodes tab.

ilyam8 avatar Aug 07 '24 07:08 ilyam8

That did not work unfortunately. It also had a curious side effect: as soon as I created the new setting, the Netdata browser tab closed itself. Maybe this should be reported as a separate bug. Well, all that said and done, I do not know what to do with this bug report... I feel like it is not solved for me: I'd still like to have the old behavior in the Nodes tab, but at least I now have a workable (though not really comfortable) workaround. I'll let you decide what to do with this report.

Cris70 avatar Aug 07 '24 08:08 Cris70

@Cris70 Yes, it is not solved. In this comment I said this was a feature request for Netdata UI developers.

ilyam8 avatar Aug 07 '24 08:08 ilyam8

Thank you once again @ilyam8 ! This is the closest match to the previous implementation. Unfortunately I have been unable to make it stick: as soon as I change tab it reverts to the default. I have been unable to find anything in the docs or in the community on how to save the setting. AND it does not affect the metric in the Nodes tab (but this may be a consequence of my inability to save the new setup).

@Cris70 : The nodes tab view is a simplistic view of your node and we do not intend to have options to group by and other options there yet. The Metrics tab is where you can have all the customisations and you can possibly create a custom dashboard that you and your team can look at during monitoring / troubleshooting. I will close this issue for now. It will be great if you can create another issue for the browser tab getting closed on saving the setting.

sashwathn avatar Aug 12 '24 12:08 sashwathn