Intermittent failure to stop certain fans with curves that involve powering fan on/off
I have some fans set to spin down completely once certain temps go below a threshold, and then spin up once going above it. For quite some time over multiple versions, I think starting with 109, which is when I first tried the program, I've seen it occur intermittently that some of these fans would still be spinning when they shouldn't be, usually at a speed closer to their minimum activation value on the curve. I.e. if it starts at 35% it might be running between 37 and 44%. It appears that sometimes if temps change so that they should re-activate, and then go back to powering off, they might sometimes end up shutting off correctly. I cannot figure out how to reproduce this problem reliably 100%. Sometimes it happens, sometimes it doesn't. It's also possible, although I cannot confirm this, that the problem even corrects itself over a long enough period of time.
This video will give much more detail about the most recent example. Lately it's just been that fan, however, I have seen it happen on other fans with on/off regimes before. It's just that I set this fan to trigger a lot more often, so probably this glitch happens more often on this fan because of that.
https://www.youtube.com/watch?v=ikFCo1u_DAc
Overall it's been a great way to control the fans, especially when it comes to dealing with fans that have a hard time starting but can spin very slow, excellent in those regards, and the ability to apply multiple temp source controls and combine them. It's just this one nagging issue that I can't figure out. Maybe I can fix the configuration.
I've observed that behaviour aswell, and i'm still trying to understand what triggers it. When a fan that is supposed to stop doesn't (as you described), did you actually check if the fan is actually spining? (i mean, physically). I ask because i suspect it's a display bug only (activity percentage on FanControl not reflecting reality).
In my case, none of my curves are making the fans stop, but occasionally, when they are supposed to reach the minimum of 25%, they stay stuck at 26-28%, despite the temp is clearly below the threshold. Sometimes, the bug is pretty obvious, where you have 2 different fans using the exact same graph curve, one behaving normally, and the other getting stuck with an extra activity percentage. Turning off then on the control button fixes the problem.
Recently, i discovered that it can also happen with fans that have their control button turned off. In that case, despite they are supposed to be controlled by the BIOS, Fancontrol obviously reports an invalid activity percentage.
I will check if the fans are actually spinning or not, but in the past I've noticed that what the RPM is reporting is accurate. I have checked in the past and never noticed a discrepancy. But next time this happens I will verify. Also, great idea for testing purposes to lock two fans to the same curve just to see if it's consistent across the fans. If indeed it is inconsistent, that is very interesting.
I'm starting to suspect it's related to an interaction conflict. In my case, i also have hwinfo in the background, which cannot control fans, but it can read their RPM. I'll stop using it for a while, and see if the bug still occurs. Anyway, even if it's a conflict issue, having a fan say X%, while its curve says Y%, looks pretty weird.
Don't look at your curves, when "that" happens. Your curves are fine.
Set the specific fan to manual mode, and try to stop it with the slider all the way to the left. If it doesn't and the % doesn't follow the order, then something outside the software is preventing it.
FanControl only "asks politely" the SuperIO chip to set a value from the curve. The % shown is the actual value being applied, even if the BIOS took over and said "NOP" to FanControl, which I think is what you are experiencing in your video.
I should probably add a UX element to shown that a control card isn't working properly or is being controlled by a 3rd party.
TLDR: Play with manual mode for your tests.
Yea, it looks like the BIOS did take over there because the % being shown was the same voltage as the BIOS minimum speed setting at that temperature. 35% of 12v is 4.2v, which is what it would be running at on the BIOS curve. So it seems that for some reason, the place where the BIOS intermittently might take over is when it's time to turn off the fan with FanControl.
If there is no direct solution, maybe an option to overcome this issue could be added in which every time a fan is to be shut off, it flips the switch to control the fan with the software off/on because that always works make them shut off. Granted, it might be a rather idiosyncratic solution that only applies to a small % of users.
In this video I saw both a success to spin the fans down manually and one failure.
https://youtu.be/2YBrtad22cg
Perhaps a certain sequence of manual fan control settings could reproduce the exact point at which the BIOS takes over. But so far I'm noticing with manual control, sometimes it will shut off properly, other times BIOS takes over. Almost like flipping a coin. I don't yet notice any pattern in terms of what will cause it to fail vs succeed. At one point I managed to get it to fail to SPIN UP one of the fans. Dragged it to a high setting, nothing happened. In order to make it start up the fan I had to drag it back down to zero and then try to turn it on again.
I'm starting to suspect it's related to an interaction conflict. In my case, i also have hwinfo in the background, which cannot control fans, but it can read their RPM. I'll stop using it for a while, and see if the bug still occurs. Anyway, even if it's a conflict issue, having a fan say X%, while its curve says Y%, looks pretty weird.
I've been running hwinfo to monitor the fans in the tray. I'm going to experiment with not using it, which will drive me crazy having to check but it's worth seeing if that's the issue.
I'm starting to suspect it's related to an interaction conflict. In my case, i also have hwinfo in the background, which cannot control fans, but it can read their RPM. I'll stop using it for a while, and see if the bug still occurs. Anyway, even if it's a conflict issue, having a fan say X%, while its curve says Y%, looks pretty weird.
I've been running hwinfo to monitor the fans in the tray. I'm going to experiment with not using it, which will drive me crazy having to check but it's worth seeing if that's the issue.
It's a bit early to tell but according to my preliminary tests, the behaviour stopped. If you want to try to confirm that aswell, don't forget that hwinfo isn't the only software that reads/writes from/to the fans, close everything (i would suggest closing everything that reads temperatures aswell, just in case).
If i had to expand my theory, it would be that if FC sends an activity percentage signal when hwinfo is currently asking for an RPM reading, the command fails (probably because of a hardware limitation that forbids 2 incoming accesses to the fan). That would explain why that behaviour isn't permanent, it's basically a matter of luck, since it only happens when both softwares try to interact with the fan simultaneously.
But still, if that's true, that also means when FC reaches the lowest breakpoint of a curve, it assumes the activity percentage was successfully applied and doesn't try to send it afterwards anymore if the temp remains below the threshold (which actually makes sense, since the lowest breakpoint is always a flat curve).
If it's not possible to actually know when a % signal was denied or not (because of a busy state), an easy fix would be to add a periodic check to see if the fan % is different from its associated curve %, if yes, resend a signal. I guess having the frequency of such loop follow the "Response time" parameter would make sense.
So I checked and just verified a failure to shut off one of the fans occurred even with HWinfo not running. I am running MSI Afterburner but I don't think it has any type of option to access these fans. So even if HWinfo was interfering sometimes, that would not be the only problem.
Your proposed solution sounds very good xhul-dev.
So I checked and just verified a failure to shut off one of the fans occurred even with HWinfo not running. I am running MSI Afterburner but I don't think it has any type of option to access these fans. So even if HWinfo was interfering sometimes, that would not be the only problem.
Your proposed solution sounds very good xhul-dev.
If MSI afterburner can read the GPU fans RPM, it's a potential conflict aswell. It could also be that it's related to a conflict in temp monitoring, rather than fan interaction.
@Max0r847 Some PWM headers simply don't like being turned off. I got a header on my mobo which won't go below 17%. Your video kinda confirms that behavior, but in your case it's the the 30s range.
Edit: Just saw your bios config. Turn off smart mode, and set a safe default value. Got a msi board also, and I had to turn off smart mode on all headers.
Turn off smart mode, and set a safe default value. Got a msi board also, and I had to turn off smart mode on all headers.
I'll investigate on my side aswell, because i happen to also have the MSI+SmartModeOn combo aswell.
K, and I just confirmed a fan failed to turn off with both MSI Afterburner & HWmonitor not running, so I've eliminated that possibility. Will mess with BIOS
@Max0r847 Just untick smart mode on all your headers.
I put all fans in non-smart modes for testing purpose. But on the long term, i'll probably put the CPU fan to smart again at some point, because that silly MSI BIOS puts that fan at max speed by default, i definitely don't want to hear that plane sound each time i turn on my machine.
@xhul-dev In my case, it goes full blast for like 3 seconds and then goes back down. I never really restart my PC, so it doesn't bother me.
@xhul-dev In my case, it goes full blast for like 3 seconds and then goes back down. I never really restart my PC, so it doesn't bother me.
He he, I wish i could do the same, but since i have an intel CPU, and the management engine driver on my system, i don't like the idea of being potentially monitored.
After a total disabling of smart mode in BIOS, followed by a proper restart, the bug can still occur, at least on my side.
Haha, yea the 120x38 Delta fan is on the pump header, which, if smart mode is disabled, will not let you adjust the permanent fan speed, so it will be running at max by default. Fortunately it's a low speed Delta, which means it only pushes 120 CFM, so I won't suffer permanent hearing loss.
So anyway, I did the same, disabled all the smart modes on the headers. On the CPU/PUMP headers if you disable smart mode there is no speed selection, it will just default to maximum. On the SYSFAN headers you can adjust a permanent speed if smart mode is disabled.
It seems to me like the failures to flip fans off problem happens less with smart mode off than with it on, but it can still happen. Also, as far as not running hwmonitor/afterburner and other such programs, that seems to also help make the problem happen less often, but it definitely still happened.
Now as far as PWM headers not wanting to go below a certain %, I don't think it's that because if it was, this problem would have consistently stuck with whatever headers. Instead, there were times, depending on which curves I was using, that the problems happened more with certain fans than others. I've seen this problem happen A LOT on every single header that had a on/off fan curve I was using in FanControl. So that's 5 different headers, which INCLUDES the CPU and PUMP headers, as well as three SYSFAN headers. Or are you saying it's not something where it's just one header being weird, but could easily be all the mobo headers acting that way?
@Max0r847: If the theory of some code interfering with FC is valid, i believe reducing your "Response time" parameter on the concerned curves might also reduce the probability that the occurence happens, you should give it a shot. I personally use a 0°C Hysteresis and 1 sec RT and i have to say the bug is pretty shy.
@Rem0o:
Are you considering implementing a curve activity check? something like, in human language:
if fan_activity different than associated_curve_activity, send associated_curve_activity signal to fan
Of course you would need to define where and how often to run such code.
We know for sure that the desynchronisation can happen when curve_activity is below the very first breakpoint of a graph curve, but beware, that doesn't mean it can't happen elsewhere.
@xhul-dev
old topic, but there is a "force apply" option now on controls, which will deal with desynchronisation.