thinkfan icon indicating copy to clipboard operation
thinkfan copied to clipboard

Autostart fails, and restart is needed because hwmon is not found

Open lakotamm opened this issue 4 years ago • 13 comments

Hello, For the last few days, I have been experiencing issues with thinkfan failing to start, because it cannot find hwmon.

OS: Manjaro Linux x86_64 - testing branch Host: ThinkPad T490 Kernel: 5.9.12-1-MANJARO CPU: Intel i7-8565U

 thinkfan.service - simple and lightweight fan control program
     Loaded: loaded (/usr/lib/systemd/system/thinkfan.service; enabled; vendor preset: disabled)
    Drop-In: /etc/systemd/system/thinkfan.service.d
             └─override.conf
     Active: failed (Result: exit-code) since Tue 2020-12-15 21:58:42 CET; 1h 25min ago
    Process: 520 ExecStart=/usr/bin/thinkfan $THINKFAN_ARGS (code=exited, status=1/FAILURE)

Dec 15 21:58:42 miroslav-20n2cto1ww systemd[1]: Starting simple and lightweight fan control program...
Dec 15 21:58:42 miroslav-20n2cto1ww thinkfan[520]: ERROR: Error scanning /sys/devices/platform/coretemp.0/hwmon: No such file or directory
Dec 15 21:58:42 miroslav-20n2cto1ww systemd[1]: thinkfan.service: Control process exited, code=exited, status=1/FAILURE
Dec 15 21:58:42 miroslav-20n2cto1ww systemd[1]: thinkfan.service: Failed with result 'exit-code'.
Dec 15 21:58:42 miroslav-20n2cto1ww systemd[1]: Failed to start simple and lightweight fan control program.

thinkfan.yaml:

sensors:
  - hwmon: /sys/devices/platform/coretemp.0/hwmon
    indices: [2, 3, 4, 5]

levels:
  - [0, 0 , 55]
  - [1, 50, 60]
  - [2, 55, 65]
  - [3, 60, 70]
  - [4, 65, 75]
  - [5, 70, 80]
  - [6, 75, 85]
  - [7, 80, 32767]

Since restarting the service fixes the issue, I assume that hwmon is initialized after thinkfan. Is this a more widespread issue or just me having messed up configuration? Thanks for the help!

lakotamm avatar Dec 15 '20 23:12 lakotamm

I've made a small change, can you uninstall thinkfan and re-try with the latest master built from github?

vmatare avatar Dec 15 '20 23:12 vmatare

Unfortunately no improvement. I double-checked that got rid of the old version from the AUR + that I pulled the latest master. I tried it compiled for both release and debug.

 thinkfan.service - Thinkfan, the minimalist fan control program
     Loaded: loaded (/usr/local/lib/systemd/system/thinkfan.service; enabled; vendor preset: disabled)
    Drop-In: /etc/systemd/system/thinkfan.service.d
             └─default.conf, override.conf
     Active: failed (Result: exit-code) since Wed 2020-12-16 01:02:42 CET; 43s ago
    Process: 490 ExecStart=/usr/local/sbin/thinkfan $THINKFAN_ARGS (code=exited, status=1/FAILURE)

Dec 16 01:02:42 miroslav-20n2cto1ww systemd[1]: Starting Thinkfan, the minimalist fan control program...
Dec 16 01:02:42 miroslav-20n2cto1ww thinkfan[490]: ERROR: Error scanning /sys/devices/platform/coretemp.0/hwmon: No s>
Dec 16 01:02:42 miroslav-20n2cto1ww systemd[1]: thinkfan.service: Control process exited, code=exited, status=1/FAILU>
Dec 16 01:02:42 miroslav-20n2cto1ww systemd[1]: thinkfan.service: Failed with result 'exit-code'.
Dec 16 01:02:42 miroslav-20n2cto1ww systemd[1]: Failed to start Thinkfan, the minimalist fan control program.

Restarting the service still (temporarily, until the next restart) fixes the issue.

lakotamm avatar Dec 16 '20 00:12 lakotamm

Can you give me the output of systemctl list-dependencies thinkfan please?

vmatare avatar Dec 16 '20 01:12 vmatare

Here it is.

 >>> systemctl list-dependencies thinkfan                                                       [130]
thinkfan.service
 ├─system.slice
 └─sysinit.target
   ├─dev-hugepages.mount
   ├─dev-mqueue.mount
   ├─kmod-static-nodes.service
●   ├─ldconfig.service
   ├─lvm2-lvmetad.socket
   ├─lvm2-lvmpolld.socket
   ├─lvm2-monitor.service
   ├─proc-sys-fs-binfmt_misc.automount
   ├─sys-fs-fuse-connections.mount
   ├─sys-kernel-config.mount
   ├─sys-kernel-debug.mount
   ├─sys-kernel-tracing.mount
   ├─systemd-ask-password-console.path
   ├─systemd-binfmt.service
●   ├─systemd-boot-system-token.service
●   ├─systemd-firstboot.service
●   ├─systemd-hwdb-update.service
●   ├─systemd-journal-catalog-update.service
   ├─systemd-journal-flush.service
   ├─systemd-journald.service
●   ├─systemd-machine-id-commit.service
   ├─systemd-modules-load.service
   ├─systemd-random-seed.service
●   ├─systemd-repart.service
   ├─systemd-sysctl.service
●   ├─systemd-sysusers.service
   ├─systemd-timesyncd.service
   ├─systemd-tmpfiles-setup-dev.service
   ├─systemd-tmpfiles-setup.service
   ├─systemd-udev-trigger.service
   ├─systemd-udevd.service
●   ├─systemd-update-done.service
   ├─systemd-update-utmp.service
   ├─cryptsetup.target
   ├─local-fs.target
   │ ├─-.mount
   │ ├─boot-efi.mount
●   │ ├─systemd-fsck-root.service
   │ ├─systemd-remount-fs.service
   │ └─tmp.mount
   └─swap.target

lakotamm avatar Dec 16 '20 14:12 lakotamm

OK, so I was looking for a dependency on systemd-modules-load.service. I added it to the thinkfan service file, but it also seems to be part of sysinit.target on which thinkfan has always depended. So thinkfan should start only after all modules have been initialized. Assuming systemd dependency handling works correctly, the only explanation for this issue would be that a module takes very long to initialize its sysfs files after being loaded.

I was always under the impression that an application like thinkfan shouldn't be responsible for this problem. This is dependency management and therefore it's the job of the system's service manager. So technically, this should be taken up with either systemd or the kernel itself.

However for the sake of practicality, I might implement a workaround thinkfan. For the time being, affected users should find some other way to delay thinkfan startup. For examle:

Do systemctl edit thinkfan and add the following:

[Unit]
After=graphical.target

This should delay thinkfan startup until after the graphical login manager has started, which should hopefully be enough.

vmatare avatar Dec 16 '20 16:12 vmatare

Thanks for your help! If you want, I can try to make a bug report, however, my knowledge of systemd is very limited, so I might not be able to answer their questions.

I tried your suggested workaround, and it resulted in a service being dead (inactive), not starting up. [Unit] After=graphical.target

so I tried instead basic.target and that just meant that the issue continued. Using gdm.service fixed the issue.

Printed sudo systemd-analyze critical-chain

graphical.target @2.120s
└─multi-user.target @2.120s
  └─thinkfan.service @2.060s +60ms
    └─gdm.service @2.052s +6ms
      └─optimus-manager.service @1.051s +999ms
        └─basic.target @1.043s
          └─sockets.target @1.043s
            └─snapd.socket @1.043s +534us
              └─sysinit.target @1.041s
                └─systemd-backlight@backlight:intel_backlight.service @1.646s +2.195s
                  └─system-systemd\x2dbacklight.slice @1.445s
                    └─system.slice @231ms
                      └─-.slice @231ms

lakotamm avatar Dec 16 '20 20:12 lakotamm

Anyways, thanks for this helpful report. Since the error says Error scanning /sys/devices/platform/coretemp.0/hwmon, I would think that it's related to the coretemp module. It could also be an issue with the module, e.g. if it completes its initialization procedure before having set up all sysfs symlinks.

Oh, and there's also the possibility that something else is actually un-loading and re-loading modules during bootup. If that happens during thinkfan startup, there's bound to be trouble.

So I think the precise cause of the issue should be investigated more closely before reporting bugs anywhere else.

vmatare avatar Dec 16 '20 22:12 vmatare

I have had what I think is the same issue on OpenSUSE for a few days. I guess it appeared after a recent kernel update to 5.9.14, but it might have crept in earlier. I have coretemp, acpitz and pch-skylake hwmons in my /etc/thinkfan.conf. It systematically fails on bootup with error messages like this:

> sudo systemctl status thinkfan
● thinkfan.service - Thinkfan, the minimalist fan control program
     Loaded: loaded (/usr/lib/systemd/system/thinkfan.service; enabled; vendor preset: disabled)
    Drop-In: /etc/systemd/system/thinkfan.service.d
             └─override.conf
     Active: failed (Result: exit-code) since Sun 2020-12-20 13:21:04 CET; 43s ago
    Process: 890 ExecStart=/usr/sbin/thinkfan $THINKFAN_ARGS (code=exited, status=1/FAILURE)

Dez 20 13:21:04 calypso systemd[1]: Starting Thinkfan, the minimalist fan control program...
Dez 20 13:21:04 calypso thinkfan[890]: ERROR: /etc/thinkfan.conf:52:
                                           name: coretemp
                                                 ^
                                       Could not find a hwmon with this name.
Dez 20 13:21:04 calypso systemd[1]: thinkfan.service: Control process exited, code=exited, status=1/FAILURE
Dez 20 13:21:04 calypso systemd[1]: thinkfan.service: Failed with result 'exit-code'.
Dez 20 13:21:04 calypso systemd[1]: Failed to start Thinkfan, the minimalist fan control program.

Note that this is not limited to the coretemp module; if I disable that in /etc/thinkfan.conf it will show the same message with one of the other hwmons.

On my OpenSUSE system, to fix the problem it seems to be enough to delay loading a bit by doing sudo systemctl edit thinkfan and add

[Unit]
After=basic.target

On my laptop basic.target is reached after about 2 seconds, which seems to be enough to bring up the sysfs symlinks. If I try graphical.target, I get the same error as @lakotamm with an inactive (dead) service.

phrxmd avatar Dec 20 '20 12:12 phrxmd

Just thought of another thing: The coretemp module is probably loaded on-demand and I'm not sure exactly how/when that is triggered. So you could try loading it explicitly at bootup via systemd-modules-load.service:

echo coretemp > /etc/modules-load.d/coretemp.conf

vmatare avatar Dec 20 '20 15:12 vmatare

I had the same issue, solved it by making the service file restartable on failure:

[Service]
Restart=on-failure
RestartSec=5s

broccoli5 avatar May 04 '21 23:05 broccoli5

I just delay it as tmp fix:

[Service]
ExecStartPre=/bin/sleep 10

ceigh avatar Aug 06 '21 11:08 ceigh

I just encountered the same issue on Fedora Silverblue 35. Adding 10s startup delay is still an easy fix.

lakotamm avatar Oct 09 '21 22:10 lakotamm

Fix is currently in development at https://github.com/vmatare/thinkfan/tree/init_retries. For now, the branch simply makes the optional keyword work at startup, too, providing a workaround for this problem. Soon I'll add another option max_errors which will allow a certain number of consecutive init or read errors to be ignored on a given sensor. That'll become the designated way to handle the startup issues in the 2.0 release.

vmatare avatar Mar 29 '22 22:03 vmatare

@lakotamm @ceigh @broccoli5 @phrxmd, this should be fixed in master now with a new config setting: https://github.com/vmatare/thinkfan/issues/189#issuecomment-1196011925

vmatare avatar Aug 12 '22 00:08 vmatare