FanControl.Releases icon indicating copy to clipboard operation
FanControl.Releases copied to clipboard

Update broke sensor detection.

Open iDumle opened this issue 1 year ago • 86 comments

Describe the bug

Update V187 broke sensor detection for GPU Core for my AMD Radeon 6650 XT graphics card and it's now showing an incorrect/frozen value. Screenshot fancontrol setup2

On version 186 the setup was working fine, and the software had no problem detecting the GPU Core sensor, I'm now using GPU Hot Spot instead as that seems to be updating its values.

Right after update the value was stuck at 468 'c, I then tried restarting twice after the update and the error persisted after reboot though GPU core showed a value of 0 'c and then subsequently 8 'c.

I have tried Refresh sensors detection, but it didn't change a thing it's still stuck on the same value.

Is there a log.txt file next to FanControl.exe with recent date entries? The Error log show no resent entry, though there is one from back in January I suspect it's unrelated, but just in case I'll post it. log.txt

Relevant hardware specs and setup The graphics card is a ASUS Radeon RX 6650 XT DUAL OC - 8GB, and I am running window 10, and the newest graphics drivers 24.3.1.

And here is a picture of the whole Fan Control Setup. Screenshot fancontrol setup3

iDumle avatar Apr 19 '24 04:04 iDumle

With my 7900XT, after update (187) for gpu sensors it only shows the VRSoC temp, where before it showed core and hot spot. Reverted (to 186) and it was fixed.

Windows 11, Powercolor Hellhound 7900XT, 24.3.1 drivers.

BMKoot avatar Apr 19 '24 06:04 BMKoot

I lost all my custom fan names and paired sensors with update 187. I restored 186 from a backup and all was fine again. I think update 187 needs to be looked at.

seanbirkhead avatar Apr 19 '24 08:04 seanbirkhead

After digging around a bit, I saw your repo for LibreHardwareMonitor had a resent commit 2 days ago and you added that update to v187, I suspect something went wrong in the d22cd1d commit. So with that I tried swapping LibreHardwareMonitorLib.dll from v186 in to my v187 install and that restored the sensor functionality.

iDumle avatar Apr 19 '24 10:04 iDumle

Similar issue, RX 6800, gpu core and liquid showing temps >3000°, gpu hot spot showing a sensible value. Happy to do more to diagnose/whatever, but for now I've just changed the sensor used to the hot spot.

Llamalland avatar Apr 19 '24 11:04 Llamalland

Yeah there is a new method implemented to get sensor values, but there might be a bug in there. pmlog.zip

I compiled this: https://github.com/GPUOpen-LibrariesAndSDKs/display-library/blob/master/Sample/PMLog/PMLog.cpp

If anyone here is willing to run it and show the output, might be of some help.

Rem0o avatar Apr 19 '24 15:04 Rem0o

I've compiled a test build of LibreHardwareMonitor with a possible fix for the issue. If the users with RX 6000 cards can test, will be useful (LibreHardwareMonitorLib.dll included in zip). The temps should be shown correctly with this build. A test copying the new LibreHardwareMonitorLib to fancontrol would be useful too.

LibreHardwareMonitor-pmlogtest-b-20240419.zip

epinter avatar Apr 19 '24 17:04 epinter

Yeah there is a new method implemented to get sensor values, but there might be a bug in there. pmlog.zip

I compiled this: https://github.com/GPUOpen-LibrariesAndSDKs/display-library/blob/master/Sample/PMLog/PMLog.cpp

If anyone here is willing to run it and show the output, might be of some help.

@Rem0o I ran the pmlog file and no output was given to me, a terminal existed for maybe 0.2 sekunds but I am unable to catch if anything is written in it, so I'm not sure how I'm suppose to get an output for you.

I've compiled a test build of LibreHardwareMonitor with a possible fix for the issue. If the users with RX 6000 cards can test, will be useful (LibreHardwareMonitorLib.dll included in zip). The temps should be shown correctly with this build. A test copying the new LibreHardwareMonitorLib to fancontrol would be useful too.

LibreHardwareMonitor-pmlogtest-b-20240419.zip

@epinter I ran the Libre hardware monitor and all the sensor seems to be working correctly, I opened a game and the temp in LHM corresponded to the temp show by the in game monitor. LibHwMonOutput

I swapped the LibreHardwareMonitorLib.dll file from LHM to Fan Control v187 folder and ran the software again, and it seems to be working fine now. FanControlSensorMenu

iDumle avatar Apr 19 '24 21:04 iDumle

@iDumle The pmlog sample @Rem0o sent is an official sample from AMD that uses that feature pmlog to collect sensors. To run it, you can use these parameters: pmlog.exe s 0 1000 1000.

Anyway, if the test build shows the values correctly, some boards of RX 6000 series doesn't work with pmlog feature. I will send a PR to LibreHardwareMonitor excluding these boards and use the old method. It's weird that pmlog worked for me with a RX 6750 XT.

Thanks for the test and feedback!

epinter avatar Apr 19 '24 21:04 epinter

I ran pmlog again with the parameters you gave me and was able to see the output. PMlogOutput

I hope it all helps, and your welcome I am glad I can be helpful, It's a fantastic software you guys are building, so just happy to be of aid.

iDumle avatar Apr 19 '24 21:04 iDumle

I noticed that pmlog is detecting an edge temp sensor so I ran it in the background while running overwatch and pmlog is updating the temp correctly and it matches with the temp sensor in overwatch. PmlogEdgeTempWorking

This should be the same sensor that v187 LibreHardwareMonitorLib.dll was trying to monitor with the new method right? Not sure what's going on but I though i'd mention it.

iDumle avatar Apr 19 '24 22:04 iDumle

I noticed that pmlog is detecting an edge temp sensor so I ran it in the background while running overwatch and pmlog is updating the temp correctly and it matches with the temp sensor in overwatch. PmlogEdgeTempWorking

This should be the same sensor that v187 LibreHardwareMonitorLib.dll was trying to monitor with the new method right? Not sure what's going on but I though i'd mention it.

That's weird. Something is breaking the pmlog inside the librehardwaremonitor and not in this sample. I think it's better to make RX 6000 to use old method to collect sensors.

epinter avatar Apr 19 '24 22:04 epinter

That's weird. Something is breaking the pmlog inside the librehardwaremonitor and not in this sample. I think it's better to make RX 6000 to use old method to collect sensors.

The saying "If it ain't broken, don't fix it" comes to mind :) that said if you at some point decide to give the new method a once over again I'd be happy to test it out.

iDumle avatar Apr 19 '24 23:04 iDumle

I'm having the same kind of issue with my 7800 XT, sensors value are corrupted, so fancontrol looses its mind and ramps the fans all the way up. Went back to 186 and no more issue.

axel-lebourhis avatar Apr 21 '24 12:04 axel-lebourhis

I'm having the same kind of issue with my 7800 XT, sensors value are corrupted, so fancontrol looses its mind and ramps the fans all the way up. Went back to 186 and no more issue.

7800 XT doesn't have problems with the build LibreHardwareMonitor fancontrol 187 is based, actually the sensors of 7800 was fixed by that build. Try the build I posted above.

epinter avatar Apr 21 '24 16:04 epinter

@iDumle Can you test this LHM build ? It's unchanged, using the new method to get the sensors. Just to be sure, check if all the sensors are stable and match amd adrenalin, or if you get something weird.

LibreHardwareMonitor-net472-nightly-4d6a755c.zip

epinter avatar Apr 21 '24 17:04 epinter

@epinter So I just ran LHM in the background while running overwatch, and as far as I can see the sensors are working fine and lines up with the one in overwatch and amd adrenalin. LibHwMonOutput187base

So something seems to gets missaligned from LHM to Fan Control.

iDumle avatar Apr 21 '24 17:04 iDumle

@epinter So I just ran LHM in the background while running overwatch, and as far as I can see the sensors are working fine and lines up with the one in overwatch and amd adrenalin. LibHwMonOutput187base

So something seems to gets missaligned from LHM to Fan Control.

Thanks for the test! So LHM is working.

epinter avatar Apr 21 '24 17:04 epinter

@Rem0o LHM is working with 6650 too, like in my test with 6750. Any idea what can cause the problem ? Maybe an issue with this ?

epinter avatar Apr 21 '24 17:04 epinter

Okay I properly should have done this from the start(feeling a little dumb I didn't do this to begin with), given the new test I did, I tried a complete redownload of v187 and imported my old config, and It seems to working as it should, so now I'm thinking something went wrong with auto updating from v186 to v187 rather then v187 not working. CleanInstallOf187

iDumle avatar Apr 21 '24 18:04 iDumle

The downloadable LHM is not the most up to date version. I'm using the latest commit since I'm compiling it myself. @iDumle

Rem0o avatar Apr 21 '24 20:04 Rem0o

@Rem0o The build I told @iDumle to download (LibreHardwareMonitor-net472-nightly-4d6a755c.zip) is using the latest commit. For reference, can be downloaded here.

epinter avatar Apr 21 '24 21:04 epinter

I'll try and make a detailed timeline of my use of the software, maybe it can shine some light on something.

  • I first downloaded FanControl after seeing Jays2cent featuring it, this was done on 14/02-2023 03:58 and was version v146
  • I unpacked it to its own folder inside my download folder, and it has not been moved afterwards.
  • I have since not interacted with the GitHub repo up until now and all subsequent updates has happened inside the software it self when prompted.
  • All updates have been without a problem, until the update from v186 to v187, immediately after the update I noted the odd behaviour, did the restarts and documented the behaviour and created this issue.
  • Everything else is laid out in this post.

What the actual root course for the odd behaviour I experienced is, I can only speculated on, It could be a windows thing, or maybe related to the fact my first version was v146 and all updated was done in software and never redownloaded from repo, or it might be that I was just unlucky.

What I do know is that I have done a new install from a newly download zip of v187 from the Github repo and unpacked it to its own folder on my desktop and loaded up my old config file, and it's working fine.

I don't know if this is helpful at all but now you have it.

iDumle avatar Apr 21 '24 23:04 iDumle

Something is still wonky on the v187 release for me, I just had a cold boot with the new fresh install that I though was working, and now the temp are frozen and incorrect. SomethingIsStillWonkyOn187 Edit @epinter After spotting this behaviour in FanControl I ran the LHM(LibreHardwareMonitor-net472-nightly-4d6a755c.zip) you gave me, and It's broken. SomethingIsStillWonkyOn187LHM

iDumle avatar Apr 22 '24 10:04 iDumle

I'm having the same kind of issue with my 7800 XT, sensors value are corrupted, so fancontrol looses its mind and ramps the fans all the way up. Went back to 186 and no more issue.

7800 XT doesn't have problems with the build LibreHardwareMonitor fancontrol 187 is based, actually the sensors of 7800 was fixed by that build. Try the build I posted above.

I tested the build you posted, here is the result: image

In FanControl: image

axel-lebourhis avatar Apr 22 '24 19:04 axel-lebourhis

I don't understand... I'm running on 7800 xt for 2 weeks, no problems. I will look the code again, but I don't know why there's invalid data with some cards. What I see in common on @iDumle and @axel-lebourhis screenshots is the "GPU Memory" temperature added. The only temperature sensors on these cards are "GPU Core" and "GPU Hot Spot", as far as I know.

rx6750xt-4d6a755c

rx7800xt-4d6a755c

epinter avatar Apr 22 '24 23:04 epinter

@axel-lebourhis When you have time, can you try to run and tell me what happens ? I'm trying to isolate some code path.

LibreHardwareMonitor-net472-test1-nooverdrive-20240422.zip

epinter avatar Apr 23 '24 00:04 epinter

@iDumle

Can you try this ?

LibreHardwareMonitor-test2-getsensor_factor-20240423.zip

epinter avatar Apr 23 '24 15:04 epinter

@epinter I tested the LHM and I have two results for you.

This one I did right after downloading your LHM, and it's clearly broken, the computer had been running for about 2 hours. LHM-test2-getsensor_factor-20240423

I then did a reboot and had LHM boot with window, and I got a different result, ran overwatch and it responded fine. LHM-test2-getsensor_factor-20240423-TestAfterBoot-StartWithWindows

I then did a third reboot with start with windows disabled and it mirrored the 2nd test and was working fine too (I didn't screenshot it).

I hope this is informative enough, please tell me if you want additional data.

iDumle avatar Apr 23 '24 23:04 iDumle

@iDumle Do you have Valorant anticheat or any other similar anticheat running ? If so try to close/disable them next time LHM presents weird data, and restart LHM.

epinter avatar Apr 24 '24 00:04 epinter

I do have vanguard from Valorant installed, and game guard I think it's called from Helldivers 2, but I think vanguard is the only one that starts with windows, It's hard to avoid root level anti-cheat in todays gaming. But I'll keep and eye out for when it acts up again and try out the tests, though it's a little sporadic it seems.

iDumle avatar Apr 24 '24 03:04 iDumle