watchdog hangs on shutdown/restart
This is a sometimes happening error that the RaspiBlitz hangs in shutdown with the message that watchdog cannot stop.
This bug is under investigation and we need your help how to reproduce this problem to fix it. Its not a show stopper for release but it would be nice to get rid of it.
So if you experience it, please report:
- what sd card image did you use (version, release candidate, min or fatpack)
- in what state and on which action did the reboot/shutdown happen (during setup, after setup, etc)
- and what bonus apps do you have installed.
Just for deeper research there are the running services on min & fatpack before setup to compare:
v1.11.0rc6-min:systemctl list-units --type=service --state=running
UNIT LOAD ACTIVE SUB DESCRIPTION
avahi-daemon.service loaded active running Avahi mDNS/DNS-SD Stack
cron.service loaded active running Regular background program processing daemon
dbus.service loaded active running D-Bus System Message Bus
fail2ban.service loaded active running Fail2Ban Service
[email protected] loaded active running Getty on tty1
i2pd.service loaded active running I2P Router written in C++
ModemManager.service loaded active running Modem Manager
NetworkManager.service loaded active running Network Manager
nginx.service loaded active running A high performance web server and a reverse proxy server
polkit.service loaded active running Authorization Manager
redis-server.service loaded active running Advanced key-value store
rsyslog.service loaded active running System Logging Service
rtkit-daemon.service loaded active running RealtimeKit Scheduling Policy Service
smartmontools.service loaded active running Self Monitoring and Reporting Technology (SMART) Daemon
ssh.service loaded active running OpenBSD Secure Shell server
systemd-journald.service loaded active running Journal Service
systemd-logind.service loaded active running User Login Management
systemd-timesyncd.service loaded active running Network Time Synchronization
systemd-udevd.service loaded active running Rule-based Manager for Device Events and Files
[email protected] loaded active running Anonymizing overlay network for TCP
triggerhappy.service loaded active running triggerhappy global hotkey daemon
[email protected] loaded active running User Manager for UID 1000
[email protected] loaded active running User Manager for UID 1001
vnstat.service loaded active running vnStat network traffic monitor
wpa_supplicant.service loaded active running WPA supplicant
v1.11.0rc6-fat:systemctl list-units --type=service --state=running
UNIT LOAD ACTIVE SUB DESCRIPTION
avahi-daemon.service loaded active running Avahi mDNS/DNS-SD Stack
blitzapi.service loaded active running BlitzBackendAPI
cron.service loaded active running Regular background program processing daemon
dbus.service loaded active running D-Bus System Message Bus
fail2ban.service loaded active running Fail2Ban Service
[email protected] loaded active running Getty on tty1
i2pd.service loaded active running I2P Router written in C++
ModemManager.service loaded active running Modem Manager
NetworkManager.service loaded active running Network Manager
nginx.service loaded active running A high performance web server and a reverse proxy server
polkit.service loaded active running Authorization Manager
redis-server.service loaded active running Advanced key-value store
rsyslog.service loaded active running System Logging Service
rtkit-daemon.service loaded active running RealtimeKit Scheduling Policy Service
[email protected] loaded active running Serial Getty on ttyAMA10
smartmontools.service loaded active running Self Monitoring and Reporting Technology (SMART) Daemon
ssh.service loaded active running OpenBSD Secure Shell server
systemd-journald.service loaded active running Journal Service
systemd-logind.service loaded active running User Login Management
systemd-timesyncd.service loaded active running Network Time Synchronization
systemd-udevd.service loaded active running Rule-based Manager for Device Events and Files
[email protected] loaded active running Anonymizing overlay network for TCP
triggerhappy.service loaded active running triggerhappy global hotkey daemon
[email protected] loaded active running User Manager for UID 1000
[email protected] loaded active running User Manager for UID 1001
vnstat.service loaded active running vnStat network traffic monitor
Something to try out --> sudo nano /etc/systemd/system.conf to activate the option RebootWatchdogSec=3min
here are some details on this option:
Description: This setting specifies the timeout for the reboot watchdog. If a reboot takes longer than the specified time, the system will be hard-rebooted. This is useful for ensuring that the system recovers from a state where it has begun the reboot process but gets stuck before completion.
Usage: Set to a time value, such as 10min. If a reboot process exceeds this duration, the watchdog triggers a system reboot to recover from potential hang-ups during shutdown or reboot sequences.
The question is .. can you fight watchdog with watchdog?
OK activating now Watchdog with RebootWatchdogSec on v1.11rc7 - please report if you still have the hanging shutdown/reboot that take longer than 3min after this.
so far rc7 stable ... closing for final release
Reopening as this is still happening occasionally eg https://t.me/raspiblitz/142982 + reported by @fusion44 Some ideas:
for the watchdog problem can try reducing the TimeoutStopSec in:
# An extended timeout period is needed to allow for database compaction
# and other time intensive operations during startup. We also extend the
# stop timeout to ensure graceful shutdowns of lnd.
TimeoutStartSec=1200
TimeoutStopSec=3600
with the command:
sudo systemctl edit --full lnd
the watchdog service is set to: RuntimeWatchdogSec=600s in /etc/systemd/system.conf
I feel that should be closer to 3600 if we wan to keep patient with LND
Raspberry Pi 5 Model B Rev 1.0 is rebooting as expected on 1.11.0 Raspberry Pi 4 Model B Rev 1.5 got this issue when rebooting till update to 1.11.0
The watchdog hung error just happened to me on the first boot of upgrading from v1.11.0 to v1.11.2 with the fatpack image. I have a rpi4. Power cycled it and it came up ok on the 2nd boot.
@bitsam with RaspiBlitz v1.12.0 we will change to a new RaspiOS base image - lets keep fingers crossed that this issues goes away with that update for the rpi4s