raspiblitz icon indicating copy to clipboard operation
raspiblitz copied to clipboard

watchdog hangs on shutdown/restart

Open rootzoll opened this issue 2 years ago • 15 comments

This is a sometimes happening error that the RaspiBlitz hangs in shutdown with the message that watchdog cannot stop.

signal-2024-04-07-162116_002

This bug is under investigation and we need your help how to reproduce this problem to fix it. Its not a show stopper for release but it would be nice to get rid of it.

So if you experience it, please report:

  • what sd card image did you use (version, release candidate, min or fatpack)
  • in what state and on which action did the reboot/shutdown happen (during setup, after setup, etc)
  • and what bonus apps do you have installed.

rootzoll avatar Apr 07 '24 12:04 rootzoll

Just for deeper research there are the running services on min & fatpack before setup to compare:

v1.11.0rc6-min:systemctl list-units --type=service --state=running

  UNIT                      LOAD   ACTIVE SUB     DESCRIPTION
  avahi-daemon.service      loaded active running Avahi mDNS/DNS-SD Stack
  cron.service              loaded active running Regular background program processing daemon
  dbus.service              loaded active running D-Bus System Message Bus
  fail2ban.service          loaded active running Fail2Ban Service
  [email protected]        loaded active running Getty on tty1
  i2pd.service              loaded active running I2P Router written in C++
  ModemManager.service      loaded active running Modem Manager
  NetworkManager.service    loaded active running Network Manager
  nginx.service             loaded active running A high performance web server and a reverse proxy server
  polkit.service            loaded active running Authorization Manager
  redis-server.service      loaded active running Advanced key-value store
  rsyslog.service           loaded active running System Logging Service
  rtkit-daemon.service      loaded active running RealtimeKit Scheduling Policy Service
  smartmontools.service     loaded active running Self Monitoring and Reporting Technology (SMART) Daemon
  ssh.service               loaded active running OpenBSD Secure Shell server
  systemd-journald.service  loaded active running Journal Service
  systemd-logind.service    loaded active running User Login Management
  systemd-timesyncd.service loaded active running Network Time Synchronization
  systemd-udevd.service     loaded active running Rule-based Manager for Device Events and Files
  [email protected]       loaded active running Anonymizing overlay network for TCP
  triggerhappy.service      loaded active running triggerhappy global hotkey daemon
  [email protected]         loaded active running User Manager for UID 1000
  [email protected]         loaded active running User Manager for UID 1001
  vnstat.service            loaded active running vnStat network traffic monitor
  wpa_supplicant.service    loaded active running WPA supplicant

v1.11.0rc6-fat:systemctl list-units --type=service --state=running

  UNIT                          LOAD   ACTIVE SUB     DESCRIPTION
  avahi-daemon.service          loaded active running Avahi mDNS/DNS-SD Stack
  blitzapi.service              loaded active running BlitzBackendAPI
  cron.service                  loaded active running Regular background program processing daemon
  dbus.service                  loaded active running D-Bus System Message Bus
  fail2ban.service              loaded active running Fail2Ban Service
  [email protected]            loaded active running Getty on tty1
  i2pd.service                  loaded active running I2P Router written in C++
  ModemManager.service          loaded active running Modem Manager
  NetworkManager.service        loaded active running Network Manager
  nginx.service                 loaded active running A high performance web server and a reverse proxy server
  polkit.service                loaded active running Authorization Manager
  redis-server.service          loaded active running Advanced key-value store
  rsyslog.service               loaded active running System Logging Service
  rtkit-daemon.service          loaded active running RealtimeKit Scheduling Policy Service
  [email protected] loaded active running Serial Getty on ttyAMA10
  smartmontools.service         loaded active running Self Monitoring and Reporting Technology (SMART) Daemon
  ssh.service                   loaded active running OpenBSD Secure Shell server
  systemd-journald.service      loaded active running Journal Service
  systemd-logind.service        loaded active running User Login Management
  systemd-timesyncd.service     loaded active running Network Time Synchronization
  systemd-udevd.service         loaded active running Rule-based Manager for Device Events and Files
  [email protected]           loaded active running Anonymizing overlay network for TCP
  triggerhappy.service          loaded active running triggerhappy global hotkey daemon
  [email protected]             loaded active running User Manager for UID 1000
  [email protected]             loaded active running User Manager for UID 1001
  vnstat.service                loaded active running vnStat network traffic monitor

rootzoll avatar Apr 07 '24 12:04 rootzoll

Something to try out --> sudo nano /etc/systemd/system.conf to activate the option RebootWatchdogSec=3min

here are some details on this option:

Description: This setting specifies the timeout for the reboot watchdog. If a reboot takes longer than the specified time, the system will be hard-rebooted. This is useful for ensuring that the system recovers from a state where it has begun the reboot process but gets stuck before completion.

Usage: Set to a time value, such as 10min. If a reboot process exceeds this duration, the watchdog triggers a system reboot to recover from potential hang-ups during shutdown or reboot sequences.

The question is .. can you fight watchdog with watchdog?

rootzoll avatar Apr 07 '24 15:04 rootzoll

OK activating now Watchdog with RebootWatchdogSec on v1.11rc7 - please report if you still have the hanging shutdown/reboot that take longer than 3min after this.

rootzoll avatar Apr 08 '24 12:04 rootzoll

so far rc7 stable ... closing for final release

rootzoll avatar Apr 16 '24 12:04 rootzoll

Reopening as this is still happening occasionally eg https://t.me/raspiblitz/142982 + reported by @fusion44 Some ideas:

for the watchdog problem can try reducing the TimeoutStopSec in:

# An extended timeout period is needed to allow for database compaction
# and other time intensive operations during startup. We also extend the
# stop timeout to ensure graceful shutdowns of lnd.
TimeoutStartSec=1200
TimeoutStopSec=3600

with the command:

sudo systemctl edit --full lnd

the watchdog service is set to: RuntimeWatchdogSec=600s in /etc/systemd/system.conf

I feel that should be closer to 3600 if we wan to keep patient with LND

openoms avatar Apr 24 '24 07:04 openoms

Raspberry Pi 5 Model B Rev 1.0 is rebooting as expected on 1.11.0 Raspberry Pi 4 Model B Rev 1.5 got this issue when rebooting till update to 1.11.0

LOCHER-21 avatar May 03 '24 21:05 LOCHER-21

The watchdog hung error just happened to me on the first boot of upgrading from v1.11.0 to v1.11.2 with the fatpack image. I have a rpi4. Power cycled it and it came up ok on the 2nd boot.

bitsam avatar Aug 26 '24 04:08 bitsam

@bitsam with RaspiBlitz v1.12.0 we will change to a new RaspiOS base image - lets keep fingers crossed that this issues goes away with that update for the rpi4s

rootzoll avatar Aug 30 '24 17:08 rootzoll