Unable to set up my fans, fan2go.yaml settings seem to be ignored
Describe the bug
I'm having trouble setting up my fan curves. I have set minPwm, startPwm and maxPwm for each fan, but fans are still getting initialized on every run, and the curve values don't appear to follow what I wanted.
To Reproduce Steps to reproduce the behavior:
- Run
sudo /nix/store/84601dyyp25hqbq9a49kplnj76ljh1ji-fan2go-0.9.0/bin/fan2go -c fan2go.yaml -v(see the config below) - Observe fans being initialized, with a long stream of
Setting Fan PWM of ..., thenSaving pwm map to fan..., and finallyMeasuring RPM of ... - Observe a stream of
Evaluating curvewithDesired PWM: 0, despite temperature readings from sensors being noticeable above theirminvalues. - Re-run (1) and observe the same behavior each time (actually, Measuring part does seem to be skipped on subsequent runs; but not Setting Fan PWM part, which still takes a while and results in undesirable "slow spin down" behavior at the start)
Expected behavior
- Fans should not be initialized again on each run.
- Fans' PWM should be set according to linear curves (or an average of them) between
minandmaxtemp readings, and the correspondingminPwmandmaxPwmvalues.
Screenshots My config:
dbPath: /var/lib/fan2go/fan2go.db
sensors:
- id: cpu_package
hwmon:
platform: coretemp
index: 1
- id: disk_0
hwmon:
platform: drivetemp-scsi-0-0
index: 1
- id: disk_4
hwmon:
platform: drivetemp-scsi-4-0
index: 1
- id: nvme_0
hwmon:
platform: nvme-pci-01.*
index: 1
- id: nvme_1
hwmon:
platform: nvme-pci-06.*
index: 1
curves:
- id: cpu
linear:
sensor: cpu_package
min: 20000
max: 80000
- id: disk_0
linear:
sensor: disk_0
min: 20000
max: 60000
- id: disk_4
linear:
sensor: disk_4
min: 20000
max: 60000
- id: bay_0
function:
type: average
curves:
- disk_0
- id: bay_1
function:
type: average
curves:
- disk_4
- id: nvme_0
linear:
sensor: nvme_0
min: 20000
max: 70000
- id: nvme_1
linear:
sensor: nvme_1
min: 20000
max: 70000
- id: mobo
function:
type: average
curves:
- nvme_0
- nvme_1
fans:
- id: cpu
hwmon:
platform: quadro
rpmChannel: 3
pwmChannel: 3
controlAlgorithm: direct
neverStop: true
curve: cpu
minPwm: 33
startPwm: 33
maxPwm: 255
- id: bay_0
curve: bay_0
hwmon:
platform: quadro
rpmChannel: 2
pwmChannel: 2
controlAlgorithm: direct
neverStop: true
minPwm: 90
startPwm: 90
maxPwm: 200
- id: bay_1
curve: bay_1
hwmon:
platform: quadro
rpmChannel: 4
pwmChannel: 4
controlAlgorithm: direct
neverStop: true
minPwm: 90
startPwm: 90
maxPwdm: 200
- id: mobo
curve: mobo
hwmon:
platform: quadro
rpmChannel: 1
pwmChannel: 1
controlAlgorithm: direct
neverStop: true
minPwm: 25
startPwm: 25
maxPwm: 255
Here's what the log said after the (very long) initialization (which happens on each run):
DEBUG Evaluating curve 'cpu'. Sensor 'cpu_package' temp '32°'. Desired PWM: 0
DEBUG Evaluating curve 'nvme_0'. Sensor 'nvme_0' temp '35°'. Desired PWM: 0
DEBUG Evaluating curve 'nvme_1'. Sensor 'nvme_1' temp '35°'. Desired PWM: 0
DEBUG Evaluating curve 'disk_4'. Sensor 'disk_4' temp '22°'. Desired PWM: 0
DEBUG Evaluating curve 'disk_0'. Sensor 'disk_0' temp '23°'. Desired PWM: 0
And here's what fan2go detect showed around that same time:
> jc42
Sensors Index Label Value
1 hwmon6/temp1 (temp1_input) 30500
> drivetemp-scsi-4-0
Sensors Index Label Value
1 hwmon4/temp1 (temp1_input) 22000
> coretemp-isa-0000
Sensors Index Label Value
1 Package id 0 (temp1_input) 31000
2 Core 0 (temp2_input) 25000
3 Core 4 (temp6_input) 28000
4 Core 8 (temp10_input) 26000
5 Core 12 (temp14_input) 30000
6 Core 16 (temp18_input) 28000
7 Core 20 (temp22_input) 28000
8 Core 24 (temp26_input) 29000
9 Core 25 (temp27_input) 29000
10 Core 26 (temp28_input) 28000
11 Core 27 (temp29_input) 28000
12 Core 28 (temp30_input) 27000
13 Core 29 (temp31_input) 27000
14 Core 30 (temp32_input) 27000
15 Core 31 (temp33_input) 27000
> nvme-pci-0100
Sensors Index Label Value
1 Composite (temp1_input) 35850
> quadro-hid-3-6
Fans Index Channel Label RPM PWM Auto
1 1 Fan 1 speed 327 25 false
2 2 Fan 2 speed 577 90 false
3 3 Fan 3 speed 400 33 false
4 4 Fan 4 speed 588 90 false
5 5 Flow speed [dL/h] 0 N/A false
Sensors Index Label Value
1 Sensor 1 (temp1_input) N/A
2 Sensor 2 (temp2_input) N/A
3 Sensor 3 (temp3_input) N/A
4 Sensor 4 (temp4_input) N/A
5 Virtual sensor 1 (temp5_input) N/A
6 Virtual sensor 2 (temp6_input) N/A
7 Virtual sensor 3 (temp7_input) N/A
8 Virtual sensor 4 (temp8_input) N/A
9 Virtual sensor 5 (temp9_input) N/A
10 Virtual sensor 6 (temp10_input) N/A
11 Virtual sensor 7 (temp11_input) N/A
12 Virtual sensor 8 (temp12_input) N/A
13 Virtual sensor 9 (temp13_input) N/A
14 Virtual sensor 10 (temp14_input) N/A
15 Virtual sensor 11 (temp15_input) N/A
16 Virtual sensor 12 (temp16_input) N/A
17 Virtual sensor 13 (temp17_input) N/A
18 Virtual sensor 14 (temp18_input) N/A
19 Virtual sensor 15 (temp19_input) N/A
20 Virtual sensor 16 (temp20_input) N/A
> jc42
Sensors Index Label Value
1 hwmon5/temp1 (temp1_input) 31500
> drivetemp-scsi-0-0
Sensors Index Label Value
1 hwmon3/temp1 (temp1_input) 23000
> nvme-pci-06f00
Sensors Index Label Value
1 Composite (temp1_input) 34850
And here are also the fan curves:
cpu
Min PWM 33
Start PWM 33
Max PWM 255
2550 ┤ ╭────
2380 ┤ ╭────────╯
2210 ┤ ╭────────╯
2040 ┤ ╭────────╯
1870 ┤ ╭──────╯
1700 ┤ ╭──────╯
1530 ┤ ╭─╮ ╭───────╯
1360 ┤╭╯ ╰──╮ ╭────╯
1190 ┤│ │ ╭───────╯
1020 ┤│ │ ╭───╯
850 ┤│ │ ╭───╯
680 ┤│ │ ╭────╯
510 ┤│ ╰╮ ╭────╯
340 ┤│ ╰──────╯
170 ┤│
0 ┼╯
RPM / PWM
bay_0
Min PWM 90
Start PWM 90
Max PWM 200
1404 ┤ ╭─────────────────────────
1310 ┤ ╭───╯
1217 ┤ ╭───╯
1123 ┤ ╭───╯
1030 ┤ ╭─────╯
936 ┤ ╭╯
842 ┤ ╭───╯
749 ┤ ╭──╯
655 ┤╭╮ ╭─────╯
562 ┤│╰───────────────────────────────────────╯
468 ┤│
374 ┤│
281 ┤│
187 ┤│
94 ┤│
0 ┼╯
RPM / PWM
bay_1
Min PWM 90
Start PWM 90
Max PWM 232
1412 ┤ ╭────────────────────────
1318 ┤ ╭───╯
1224 ┤ ╭───╯
1130 ┤ ╭───╯
1035 ┤ ╭────╯
941 ┤ ╭─╯
847 ┤ ╭──╯
753 ┤ ╭───╯
659 ┤╭╮ ╭──────╯
565 ┤│╰───────────────────────────────────────╯
471 ┤│
377 ┤│
282 ┤│
188 ┤│
94 ┤│
0 ┼╯
RPM / PWM
mobo
Min PWM 25
Start PWM 25
Max PWM 255
5145 ┤ ╭───
4802 ┤ ╭─────╯
4459 ┤ ╭──────╯
4116 ┤ ╭────────╯
3773 ┤ ╭─────╯
3430 ┤ ╭─────╯
3087 ┤╭╮ ╭──────╯
2744 ┤││ ╭─────╯
2401 ┤││ ╭╮ ╭──────╯
2058 ┤││ ││ ╭────╯
1715 ┤││╭╯│ ╭──────╯
1372 ┤│││ │ ╭────╯
1029 ┤│││ │ ╭─────╯
686 ┤│││ │ ╭───╯
343 ┤│╰╯ ╰─────────╯
0 ┼╯
RPM / PWM
Desktop (please complete the following information):
- Distro: NixOS 24.05
-
uname -a:Linux homelab 6.11.3 #1-NixOS SMP PREEMPT_DYNAMIC Thu Oct 10 10:04:18 UTC 2024 x86_64 GNU/Linux -
sensors -v:sensors version 3.6.0 with libsensors version 3.6.0 -
fan2go version:dev(NixOS "unstable" version tag0.9.0)
Additional context
I've tried various settings, setting the steps: or sensor readings in degrees rather than milli-degrees. But nothing seems to matter - either I get fans set to 255, or 0.. And initialization takes a while each time.
OK, looks like after changing min and max values from milli-degrees to degrees again, this time it's working - although the PWM values it sets in log messages don't seem to correspond to "desired PWM" 🤔
DEBUG Setting Fan PWM of 'mobo' to 85 ...
DEBUG Evaluating curve 'disk_0'. Sensor 'disk_0' temp '23°'. Desired PWM: 19
DEBUG Evaluating curve 'disk_4'. Sensor 'disk_4' temp '22°'. Desired PWM: 12
DEBUG Evaluating curve 'nvme_0'. Sensor 'nvme_0' temp '33°'. Desired PWM: 65
DEBUG Evaluating curve 'nvme_1'. Sensor 'nvme_1' temp '34°'. Desired PWM: 70
DEBUG Setting Fan PWM of 'cpu' to 59 ...
DEBUG Evaluating curve 'disk_0'. Sensor 'disk_0' temp '23°'. Desired PWM: 19
DEBUG Evaluating curve 'cpu'. Sensor 'cpu_package' temp '28°'. Desired PWM: 32
DEBUG Evaluating curve 'disk_4'. Sensor 'disk_4' temp '22°'. Desired PWM: 12
DEBUG Evaluating curve 'nvme_0'. Sensor 'nvme_0' temp '33°'. Desired PWM: 65
DEBUG Evaluating curve 'nvme_1'. Sensor 'nvme_1' temp '34°'. Desired PWM: 70
DEBUG Evaluating curve 'disk_0'. Sensor 'disk_0' temp '23°'. Desired PWM: 19
I did check the actual RPM values currently set, and they do seem to roughly correspond to where on the graph they would be, at that temperature.
So I think the remaining issue is initialization..
OK, and now I was able to skip initialization as well, by adding this to every fan:
pwmMap:
0: 0
255: 255
I think the docs could be a little more clear on all of this, but perhaps I just didn't read them carefully enough.
Actually, I spoke too soon. While the desired values reported in the log now with this pwmMap seem to be the same as before, it sets PWM to 0 now (which is also reflected in all 0s reported by detect):
DEBUG Evaluating curve 'disk_0'. Sensor 'disk_0' temp '23°'. Desired PWM: 19
DEBUG Evaluating curve 'nvme_0'. Sensor 'nvme_0' temp '31°'. Desired PWM: 56
DEBUG Evaluating curve 'nvme_1'. Sensor 'nvme_1' temp '33°'. Desired PWM: 67
DEBUG Setting Fan PWM of 'bay_0' to 0 ...
DEBUG Evaluating curve 'disk_4'. Sensor 'disk_4' temp '22°'. Desired PWM: 12
DEBUG Setting Fan PWM of 'mobo' to 0 ...
DEBUG Evaluating curve 'cpu'. Sensor 'cpu_package' temp '30°'. Desired PWM: 44
DEBUG Evaluating curve 'disk_0'. Sensor 'disk_0' temp '23°'. Desired PWM: 19
DEBUG Setting Fan PWM of 'bay_1' to 0 ...
DEBUG Evaluating curve 'nvme_0'. Sensor 'nvme_0' temp '31°'. Desired PWM: 55
DEBUG Evaluating curve 'nvme_1'. Sensor 'nvme_1' temp '33°'. Desired PWM: 66
DEBUG Setting Fan PWM of 'cpu' to 0 ...
DEBUG Evaluating curve 'disk_4'. Sensor 'disk_4' temp '22°'. Desired PWM: 12
DEBUG Evaluating curve 'cpu'. Sensor 'cpu_package' temp '30°'. Desired PWM: 40
DEBUG Evaluating curve 'disk_0'. Sensor 'disk_0' temp '23°'. Desired PWM: 19
DEBUG Evaluating curve 'nvme_0'. Sensor 'nvme_0' temp '31°'. Desired PWM: 57
DEBUG Evaluating curve 'nvme_1'. Sensor 'nvme_1' temp '33°'. Desired PWM: 66
DEBUG Evaluating curve 'disk_4'. Sensor 'disk_4' temp '22°'. Desired PWM: 12
I think the docs could be a little more clear on all of this, but perhaps I just didn't read them carefully enough.
Without a doubt, it's hard for me to explain all of this as I know too much about it (if you know what I mean).
There are definitely some misunderstandings here, and a lot to unpack, I will get back to you after work 🤞
Hi @dinvlad , thx for your interest! Looks like you have a real spicy system to test fan2go on at your hands :smile:
I'm having trouble setting up my fan curves. I have set minPwm, startPwm and maxPwm for each fan, but fans are still getting initialized on every run, and the curve values don't appear to follow what I wanted.
The fan curve output does not change based on these parameters. The fan curve simply prints out a (rudimentary) graph of the PWM -> RPM measurement that fan2go took during the initialization of a fan. The Min/Start/Max are simply an indication of what the algorithm thinks these values should be based on the graph, unless you override them yourself in the config.
Fans should not be initialized again on each run.
They are not, at least not entirely. If you set overrides for the PWM "limits", the long initialization should be skipped entirely.
What is not skipped however, and is executed at each startup of fan2go is the calculation of the pwmMap (again, except if you set it yourself in the config). The pwmMap is used to make fan2go work with fans that do not operate in the expected 0..255 range, but f.ex. an 0..100 range, or even an extremely limited set like [0, 125, 255]. The only way for fan2go to determine this by itself is through trial and error, by setting every possible value in 0..255 and checking i fit succeeded. Since this can change due to external factors like driver updates (for a fan controller), this check is currently done on each startup and should only take about a second. Its certainly not ideal, but its what we got right now.
By specifying this:
pwmMap:
0: 0
255: 255
you are essentially telling fan2go, that this particular fan definition can only use the PWM values 0 and 255 and nothing in between, which is probably not what you intended to do.
The "expected" pwmMap, would look like this (abbreviated for readability):
pwmMap:
0: 0
1: 1
2: 2
....
254: 254
255: 255
There is nothing wrong with specifying this in the config to skip the pwmMap initialization, except its a bit ugly.
Fans' PWM should be set according to linear curves (or an average of them) between min and max temp readings, and the corresponding minPwm and maxPwm values.
I agree, that is precisely what should happen and that is what is (hopefully) implemented. If you see anything different, feel free to investigate and report, or even fix it :+1:
I think the docs could be a little more clear on all of this, but perhaps I just didn't read them carefully enough.
Again about this: If you have suggestions on how to change the README to better reflect this, please open a PR and let me know!
Thanks for clarification @markusressel , that makes sense.
this check is currently done on each startup and should only take about a second
Sadly, in my case it takes much, much longer than a second - I just measured and it was almost 11 minutes (!)
3.02user 5.76system 10:46.38elapsed
I've now specified pwmMap with all values between [0..255] at the top-level of fan2go.yaml, and used Yaml anchors to refer to it in fans. This seems to have addressed the issue of slow startup for now. However, I wonder if we should add a simple boolean flag to skip this feature entirely - sadly because it's not intuitive for newcomers how to turn it off with pwmMap, and that it can take this long, despite the intent being a quick succession.
The reason this is done is because fan2go tries to automatically detect the best mode of operation. If we disable this feature by default, we might as well remove the entire logic, document the config and call it a day. This feature was added by request from fan2go users, I have never needed it myself. Some big brand "gaming" fan controllers just seem to work in weird and unexpected ways because... reasons :smile:
I am not too keen on going that route yet, can we possibly figure out why it takes so long on your system? All fan2go does is this:
// check every pwm value
pwmMap := map[int]int{}
for i := fans.MaxPwmValue; i >= fans.MinPwmValue; i-- {
_ = fan.SetPwm(i)
time.Sleep(pwmSetGetDelay)
pwm, err := fan.GetPwm()
if err != nil {
ui.Warning("Error reading PWM value of fan %s: %v", fan.GetId(), err)
}
pwmMap[i] = pwm
}
f.pwmMap = pwmMap
SetPwm and GetPwm simply write/read an integer to/from a file.
If this is slow on your system, there has to be a reason for it. Maybe we can account for that reason somehow?
PS: Nice trick using the anchors, I didn't even know viper supports this :smile:
Thanks @markusressel - I did notice a delay even when running fan2go detect. I think it's probably inherent in the fan controller I'm using - Aquacomputer Quadro, connected to the PC over USB.
Looking at their code and associated issue (which mentions fan2go btw), it sounds like the controller is "slow" enough that they had to introduce a ~200ms delay between reads and writes:
https://github.com/aleksamagicka/aquacomputer_d5next-hwmon/blob/f20c53c7edaee2a57b7aee7a64358864d207e75f/aquacomputer_d5next.c#L852-L864
https://github.com/aleksamagicka/aquacomputer_d5next-hwmon/blob/f20c53c7edaee2a57b7aee7a64358864d207e75f/aquacomputer_d5next.c#L75
https://github.com/aleksamagicka/aquacomputer_d5next-hwmon/issues/82#issuecomment-1637173240
256 * 200ms * 2 (read + write) * 4 fans is ~410s or ~7 mins, close to 11 mins I'm seeing (there's also likely a USB communication delay, to further account for the difference).
So given that, would it be reasonable to introduce a config param to opt-out of PWM mapping (i.e. effectively assuming 0, 1, ... 255 -> 0, 1, ... 255 map)? Obviously, it would come with a disclaimer that fan control could be less accurate then.
GitHub
Linux hwmon driver for select Aquacomputer devices. Partly mainlined. - aleksamagicka/aquacomputer_d5next-hwmon
GitHub
Linux hwmon driver for select Aquacomputer devices. Partly mainlined. - aleksamagicka/aquacomputer_d5next-hwmon
I am wondering if it makes sense to detect these devices and use specific defaults for them. Do the fans have a specific platform name that's unique to this controller? Maybe we can come up with a system (f.ex. additional config files) to specify overrides for specific platforms so other people can benefit from the findings that were made in issues like this one 🤔 I would have to look into it, but I would guess that there are even more specific IDs for the controller exposed somewhere, if the platform isn't specific enough.
These are just generic PWM fans (along with Noctua for the CPU), so I don't think we can detect the fans per se. But we can probably detect the controller (Quadro, in this case). I think potentially relying just on controller name (as reported by fan2go detect, for example) would be sufficient in this case.
So given that, would it be reasonable to introduce a config param to opt-out of PWM mapping (i.e. effectively assuming 0, 1, ... 255 -> 0, 1, ... 255 map)? Obviously, it would come with a disclaimer that fan control could be less accurate then.
Its not only less accurate, but it might not work at all and cause lots of error messages. Providing an easer shortcut for the "sane" case (0..255 pwmMap) does sound reasonable, however, I would like to make fan2go work for everybody out-of-the-box. To achieve that a built-in system to specify default overrides for specific platforms still sounds like a good approach to this IMHO.
Since it is technically already possible to disable the pwmMap right now I would like to keep this as a workaround until we have a better system. We can use this issue to track progress, but unfortunately time is a very limited resource these days.