windows_exporter icon indicating copy to clipboard operation
windows_exporter copied to clipboard

Thermalzone not working

Open Ramshield opened this issue 3 years ago • 30 comments

Hi,

My exporter does not expose any metrics regarding the thermalzone. It is enabled however:

windows_exporter_collector_duration_seconds{collector="thermalzone"} 0.0060052
windows_exporter_collector_success{collector="thermalzone"} 1
windows_exporter_collector_timeout{collector="thermalzone"} 0
[System.Environment]::OSVersion.Version

Major  Minor  Build  Revision
-----  -----  -----  --------
10     0      19042  0

Exporter version: Starting windows_exporter (version=0.16.0, branch=master, revision=f316d81d50738eb0410b0748c5dcdc6874afe95a)

I run windows exporter with the following arguments: "C:\Program Files\windows_exporter\windows_exporter.exe" --log.format logger:eventlog?name=windows_exporter --telemetry.addr :9182 --collectors.enabled cpu,cs,logical_disk,logon,memory,net,os,process,service,system,tcp,time,thermalzone,textfile

I'm a Linux engineer, so I have no clue how to troubleshoot something like this. Please advice, thank you!

Ramshield avatar Jul 05 '21 07:07 Ramshield

Are there any relevant logs in the Event Viewer? windows_exporter will log to Windows Logs -> Application.

breed808 avatar Jul 13 '21 11:07 breed808

Getting something similar with no thermal data. Looking at the event viewer and filtering for windows_exporter everything is information except 2 which are warnings No filters specified for process collector. This will generate a very large number of metrics!

Looks like all my things are duplicated twice which is why I have 2 warnings with the same message. Everything else seems to be working though just a thermalzone issue.

luigi311 avatar Jul 21 '21 03:07 luigi311

I am also having this issue. I suspect my hardware does not support the thermalzone collector, but I do not know how to validate this.

crockk avatar Jul 22 '21 05:07 crockk

Checking if the thermalzone perflib metrics are present would be a good first step:

# List Counter Sets (confirm if "Thermal Zone Information" CounterSet is present)
Get-Counter -ListSet * | Sort-Object -Property CounterSetName | Select CounterSetName
# List counters for set
Get-Counter -ListSet 'Thermal Zone Information' 
# Get a counter from the set
Get-Counter -Counter '\Thermal Zone Information(*)\Temperature'

I get an error ("Get-Counter: Internal performance counter API call failed. Error: 800007d1.") when running the last command, but that may be due to my VM not having access to any hardware temperature sensors.

breed808 avatar Aug 08 '21 08:08 breed808

@breed808 thank you for your reply. I run those commands in PowerShell as Administrator.

PS C:\WINDOWS\system32> Get-Counter -ListSet * | Sort-Object -Property CounterSetName | Select CounterSetName

CounterSetName
--------------
.NET CLR Data
.NET CLR Exceptions
.NET CLR Interop
.NET CLR Jit
.NET CLR Loading
.NET CLR LocksAndThreads
.NET CLR Memory
.NET CLR Networking
.NET CLR Networking 4.0.0.0
.NET CLR Remoting
.NET CLR Security
.NET Data Provider for Oracle
.NET Data Provider for SqlServer
.NET Memory Cache 4.0
{115b92b4-7191-491a-a9b5-93c8e9fb641b}
{7d937e49-cfd5-438f-af4f-b3047d90a5c3}
{f3e82f6e-9df4-425d-a5d5-3a9832005b16}
AppV Client Streamed Data Percentage
Authorization Manager Applications
BitLocker
BITS Net Utilization
Bluetooth Device
Bluetooth Radio
BranchCache
Browser
Cache
Client Side Caching
Database
Database ==> Databases
Database ==> Instances
Database ==> TableClasses
Distributed Routing Table
Distributed Transaction Coordinator
DNS64 Global
Energy Meter
Event Log
Event Tracing for Windows
Event Tracing for Windows Session
Fax Service
FileSystem Disk Activity
Generic IKEv1, AuthIP, and IKEv2
GPU Adapter Memory
GPU Engine
GPU Local Adapter Memory
GPU Non Local Adapter Memory
GPU Process Memory
HTTP Service
HTTP Service Request Queues
HTTP Service Url Groups
Hyper-V Dynamic Memory Integration Service
Hyper-V Hypervisor
Hyper-V Hypervisor Logical Processor
Hyper-V Hypervisor Root Partition
Hyper-V Hypervisor Root Virtual Processor
Hyper-V Virtual Machine Bus Pipes
Hyper-V VM Vid Partition
ICMP
ICMPv6
IPHTTPS Global
IPHTTPS Session
IPsec AuthIP IPv4
IPsec AuthIP IPv6
IPsec Connections
IPsec Driver
IPsec IKEv1 IPv4
IPsec IKEv1 IPv6
IPsec IKEv2 IPv4
IPsec IKEv2 IPv6
IPv4
IPv6
Job Object Details
LogicalDisk
Memory
Microsoft Winsock BSP
MSDTC Bridge 3.0.0.0
MSDTC Bridge 4.0.0.0
NBT Connection
Netlogon
Network Adapter
Network Interface
Network QoS Policy
NUMA Node Memory
Objects
Offline Files
Pacer Flow
Pacer Pipe
PacketDirect EC Utilization
PacketDirect Queue Depth
PacketDirect Receive Counters
PacketDirect Receive Filters
PacketDirect Transmit Counters
Paging File
Peer Name Resolution Protocol
Per Processor Network Activity Cycles
Per Processor Network Interface Card Activity
Physical Network Interface Card Activity
PhysicalDisk
Power Meter
PowerShell Workflow
Print Queue
Process
Processor
Processor Information
RAS
RAS Port
RAS Total
RDMA Activity
ReadyBoost Cache
Redirector
ReFS
RemoteFX Graphics
RemoteFX Network
Search Gatherer
Search Gatherer Projects
Search Indexer
Security Per-Process Statistics
Security System-Wide Statistics
Server
Server Work Queues
ServiceModelEndpoint 3.0.0.0
ServiceModelEndpoint 4.0.0.0
ServiceModelOperation 3.0.0.0
ServiceModelOperation 4.0.0.0
ServiceModelService 3.0.0.0
ServiceModelService 4.0.0.0
SMB Client Shares
SMB Direct Connection
SMB Server
SMB Server Sessions
SMB Server Shares
SMSvcHost 3.0.0.0
SMSvcHost 4.0.0.0
Storage Management WSP Spaces Runtime
Storage Spaces Drt
Storage Spaces Tier
Storage Spaces Virtual Disk
Storage Spaces Write Cache
Synchronization
SynchronizationNuma
System
TCPIP Performance Diagnostics
TCPIP Performance Diagnostics (Per-CPU)
TCPv4
TCPv6
Telephony
Teredo Client
Teredo Relay
Teredo Server
Terminal Services
Terminal Services Session
Thermal Zone Information
Thread
UDPv4
UDPv6
USB
User Input Delay per Process
User Input Delay per Session
WF (System.Workflow) 4.0.0.0
WFP
WFP Classify
WFP Reauthorization
WFPv4
WFPv6
Windows Media Player Metadata
Windows Time Service
Windows Workflow Foundation
WinNAT
WinNAT ICMP
WinNAT Instance
WinNAT TCP
WinNAT UDP
WMI Objects
WorkflowServiceHost 4.0.0.0
WSMan Quota Statistics
XHCI CommonBuffer
XHCI Interrupter
XHCI TransferRing


PS C:\WINDOWS\system32> Get-Counter -ListSet 'Thermal Zone Information'


CounterSetName     : Thermal Zone Information
MachineName        : .
CounterSetType     : SingleInstance
Description        : The Thermal Zone Information performance counter set consists of counters that measure aspects of each thermal zone in the system.
Paths              : {\Thermal Zone Information(*)\High Precision Temperature, \Thermal Zone Information(*)\Throttle Reasons, \Thermal Zone Information(*)\% Passive Limit, \Thermal Zone Information(*)\Temperature}
PathsWithInstances : {\Thermal Zone Information(*)\High Precision Temperature, \Thermal Zone Information(*)\Throttle Reasons, \Thermal Zone Information(*)\% Passive Limit, \Thermal Zone Information(*)\Temperature}
Counter            : {\Thermal Zone Information(*)\High Precision Temperature, \Thermal Zone Information(*)\Throttle Reasons, \Thermal Zone Information(*)\% Passive Limit, \Thermal Zone Information(*)\Temperature}



PS C:\WINDOWS\system32> Get-Counter -Counter '\Thermal Zone Information(*)\Temperature'
Get-Counter : The specified instance is not present.
At line:1 char:1
+ Get-Counter -Counter '\Thermal Zone Information(*)\Temperature'
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidResult: (:) [Get-Counter], Exception
    + FullyQualifiedErrorId : CounterApiError,Microsoft.PowerShell.Commands.GetCounterCommand

This is on a physical PC.

Please let me know what other information I can provide.

Ramshield avatar Aug 08 '21 10:08 Ramshield

Strange, some searching indicates that this error is returned when not running the query as Administrator :confused:

Are you able to query any of the other counters, such as High Precision Temperature? It'd also be worth checking the Performance Monitor to see if any Thermal Zone Information metrics are exposed their.

breed808 avatar Aug 08 '21 10:08 breed808

Hi @breed808. Thanks for the quick reply, appreciate it!

I am unable to get any of the other counters in PowerShell, ran as Administrator.

I am unable to get them in Performance Monitor either. So it seems it's a Windows problem. So I checked Event Viewer and found 5 Warnings from the source PerfProc:

Unable to open the job object \BaseNamedObjects\WmiProviderSubSystemHostJob for query access. The calling process may not have permission to open this job. The first four bytes (DWORD) of the Data section contains the status code.

I ran Performance monitor again as administrator, hoping it would help, but it didn't. Any suggestions?

EDIT:

I found this article: https://www.tenforums.com/general-support/136109-error-event-1020-perflib-win-10-1903-a.html It says to run C:\WINDOWS\SysWOW64> Lodctr /R, which I did, twice as the first time resulted in an error. A new event was logged however:

The Open procedure for service ".NETFramework" in DLL "C:\WINDOWS\system32\mscoree.dll" failed with error code The system cannot find the file specified.. Performance data for this service will not be available.

I tried to install https://dotnet.microsoft.com/download/dotnet-framework/net48 as suggested by Google, but it already says that it's installed. So not sure what to install for that specific .dll file, but I think it's related...

Ramshield avatar Aug 08 '21 11:08 Ramshield

I've done some more searching and there's mention of repairing the .NET Framework installation to install the missing mscoree.dll file. Microsoft host a .NET Framework repair tool here: https://www.microsoft.com/en-gb/download/details.aspx?id=30135. I'm not sure how helpful it will be though.

breed808 avatar Aug 08 '21 11:08 breed808

I ran the tool, and tried to run the .NET Framework installer again as said in the tool. Unfortunately it didn't fix the Performance monitor, not even after a reboot.

Stupidly enough, I never checked if mscoree.dll was ever there, but it is now, unfortunately no luck..

Ramshield avatar Aug 08 '21 11:08 Ramshield

Checking if the thermalzone perflib metrics are present would be a good first step:

# List Counter Sets (confirm if "Thermal Zone Information" CounterSet is present)
Get-Counter -ListSet * | Sort-Object -Property CounterSetName | Select CounterSetName
# List counters for set
Get-Counter -ListSet 'Thermal Zone Information' 
# Get a counter from the set
Get-Counter -Counter '\Thermal Zone Information(*)\Temperature'

I get an error ("Get-Counter: Internal performance counter API call failed. Error: 800007d1.") when running the last command, but that may be due to my VM not having access to any hardware temperature sensors.

I also get the same error on the final command - but I am not running the commands from a VM:

PS C:\Windows\system32> Get-Counter -Counter '\Thermal Zone Information(*)\Temperature'
Get-Counter : Internal performance counter API call failed. Error: 800007d1.
At line:1 char:1
+ Get-Counter -Counter '\Thermal Zone Information(*)\Temperature'
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidResult: (:) [Get-Counter], Exception
    + FullyQualifiedErrorId : CounterApiError,Microsoft.PowerShell.Commands.GetCounterCommand

The first two commands run without error.

crockk avatar Aug 08 '21 20:08 crockk

I'm seeing similar results; running Powershell as Administrator:

PS C:\> Get-Counter -ListSet 'Thermal Zone Information'


CounterSetName     : Thermal Zone Information
MachineName        : .
CounterSetType     : SingleInstance
Description        : The Thermal Zone Information performance counter set consists of counters that measure aspects of
                     each thermal zone in the system.
Paths              : {\Thermal Zone Information(*)\High Precision Temperature, \Thermal Zone Information(*)\Throttle
                     Reasons, \Thermal Zone Information(*)\% Passive Limit, \Thermal Zone Information(*)\Temperature}
PathsWithInstances : {\Thermal Zone Information(*)\High Precision Temperature, \Thermal Zone Information(*)\Throttle
                     Reasons, \Thermal Zone Information(*)\% Passive Limit, \Thermal Zone Information(*)\Temperature}
Counter            : {\Thermal Zone Information(*)\High Precision Temperature, \Thermal Zone Information(*)\Throttle
                     Reasons, \Thermal Zone Information(*)\% Passive Limit, \Thermal Zone Information(*)\Temperature}



PS C:\> Get-Counter -Counter '\Thermal Zone Information(*)\Temperature'
Get-Counter : The specified instance is not present.
At line:1 char:1
+ Get-Counter -Counter '\Thermal Zone Information(*)\Temperature'
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidResult: (:) [Get-Counter], Exception
    + FullyQualifiedErrorId : CounterApiError,Microsoft.PowerShell.Commands.GetCounterCommand
PS C:\> [System.Environment]::OSVersion.Version

Platform ServicePack Version      VersionString
-------- ----------- -------      -------------
 Win32NT             10.0.19042.0 Microsoft Windows NT 10.0.19042.0

This is running on Windows Server 20H2, on bare metal with almost nothing else installed or configured. Using windows_exporter v0.16.0.

CPU: AMD Threadripper 3960X

rlabrecque avatar Aug 13 '21 00:08 rlabrecque

I think a separate yet related issue here is that windows_exporter_collector_success{collector="thermalzone"} 1 should be 0.

rlabrecque avatar Aug 13 '21 00:08 rlabrecque

Apologies all, I've checked the thermalzone collector to fix the windows_exporter_collector_success metric, and noted the collector is actually using WMI as the metric source. So the Get-Counter commands may have been a waste of time :disappointed:

Could you run the following and see if any output is returned?

Get-CimInstance -Classname Win32_PerfRawData_Counters_ThermalZoneInformation

I've run this on my testing VM but have received no output or error.

breed808 avatar Aug 19 '21 10:08 breed808

Same here @breed808

PS C:\Windows\system32> Get-CimInstance -Classname Win32_PerfRawData_Counters_ThermalZoneInformation
>>
PS C:\Windows\system32> Get-CimInstance -Classname Win32_PerfRawData_Counters_ThermalZoneInformation
>>
PS C:\Windows\system32>

It looks almost like it expects something extra.

Ramshield avatar Aug 19 '21 11:08 Ramshield

@breed808 Any update/suggestions on this? I don't mind joining an IRC or something so we can troubleshoot this faster if you'd like.

Ramshield avatar Sep 03 '21 10:09 Ramshield

@Ramshield I don't mind supporting over IRC, but I'm not sure if I can be of much more help here. I think we need someone with more ThermalZone experience, as there seems to be a some prerequisite missing here.

breed808 avatar Sep 25 '21 04:09 breed808

@breed808 Anyone we can mention who might be able to help? :)

Ramshield avatar Sep 25 '21 14:09 Ramshield

It's been a few years since I looked at this last time, but from what I recall, the ThermalZone data was very finicky, and requires some driver support which we never managed to pin down exactly what was supposed to provide... The root issue seemed to be that there's actually no unified API for this, so the conclusion at the time was that it'd be a lot of work to implement this in any other way. If there are suggestions for how to achieve this though, I think we'd be very happy to replace the current implementation!

carlpett avatar Sep 25 '21 15:09 carlpett

Is there any way to take a look at for example Open hardware monitor for inspiration, at the least? Perhaps they are open for discussion for advice!

Ramshield avatar Sep 25 '21 15:09 Ramshield

There was some work on reusing OHM in #727, but it stalled on a mix of licensing issues and whether it was a good integration pattern.

carlpett avatar Sep 25 '21 15:09 carlpett

I am running it on a German system and it seems it cannot collect data as I have to run the following command to get the relevant data Get-Counter -ListSet 'Thermozoneninformationen'. Any ideas on how to deal with non-English systems?

namxam avatar Oct 17 '21 18:10 namxam

There was some work on reusing OHM in #727, but it stalled on a mix of licensing issues and whether it was a good integration pattern.

Maybe Open Hardware Monitor is a solution. It exposes it's readings to WMI and it's unter the MPL 2.0 license.

http://openhardwaremonitor.org/wordpress/wp-content/uploads/2011/04/OpenHardwareMonitor-WMI.pdf

It seems that it can be interfaced with it's DLL.

https://stackoverflow.com/questions/3262603/accessing-cpu-temperature-in-python

ottobaer avatar Oct 24 '21 09:10 ottobaer

I am running it on a German system and it seems it cannot collect data as I have to run the following command to get the relevant data Get-Counter -ListSet 'Thermozoneninformationen'. Any ideas on how to deal with non-English systems?

I'm facing the same issue, in my case it's in spanisht and it seems it cant get temperature values to pass them. '\Información sobre la zona térmica(*)\Temperatura'

samuelinho avatar Feb 10 '22 20:02 samuelinho

The translated ListSet names dont't match the English name in the collector.

From the previous reports I've seen on this issue, not all ListSets have translation problems (or are not translated). It's something we should address at some stage, else we're excluding entire localizations from running the exporter.

breed808 avatar Feb 12 '22 04:02 breed808

I was likely on American English when I tried originally and it wasn't working for me.

rlabrecque avatar Feb 12 '22 04:02 rlabrecque

Yes, there's two issues with the collector that have been raised in this thread:

  1. Unknown dependency preventing thermalzone collector and Perflib commands from returning metrics
  2. Translated name of Thermalzone ListSet preventing collector from working correctly on non-English locales.

Users in this thread are largely experiencing 1), but 2) is also a problem.

breed808 avatar Feb 12 '22 05:02 breed808

adding my "me too" here as well. German installation of MS Windows Server 2019.

stefangweichinger avatar Jul 05 '23 07:07 stefangweichinger

Same here (German, empty results set), I think we have a clear pattern

DominikRoB avatar Nov 23 '23 21:11 DominikRoB

Thermalzone not working for some reason tested the collector windows_exporter_collector_success{collector="thermalzone"} which is 0 , it is possible that these are vendor specific classes that aren't always available on all systems. therefor we should enumerate the classes if they are like thermal or temp.

if we do this in powershell we get to see some more Get-CimClass -Namespace root/cimv2 | Where-Object {$.CimClassName -like "Temp" -or $.CimClassName -like "Thermal" -or $_.CimClassName -like "Cooling"}

image

i also found out there are all zero image

even if i try to see this it gives nothing image

So its is surely possible that this information is behind specific vendor classes.

i did some more research on this it depends on the hardware some hardware isent supported but provide monitoring tools which can be used to enumerate CPU temperatures so recommendation is to build it as a custom metric, as example for dell you can use Dell Command | Monitor and maybe schedule a task to update the metrics to a textfile as a workaround.

Nilas1994 avatar Apr 22 '24 11:04 Nilas1994

We also plan an collectors which allows to scrape any perfdata based counters.

jkroepke avatar Apr 24 '24 21:04 jkroepke