thingos icon indicating copy to clipboard operation
thingos copied to clipboard

RaspberryPi 0W: fwupdate fails from time to time

Open sonicpp opened this issue 2 years ago • 2 comments

Sometimes the fwupdate fails on Rpi 0w. Here is output where it failed for the first time, so I tried it again and it succeeded:

[root@thing-115d74d5 ~]# uname -a
Linux thing-115d74d5 5.15.32 #1 Tue Sep 27 11:06:53 UTC 2022 armv6l GNU/Linux
z oot@thing-115d74d5 ~]# fwupdate upgrade /tmp/thingos-raspberrypi-unknown.img.xz
upgrading to /tmp/thingos-raspberrypi-unknown.img.xz
downloading...
downloaded [custom]
extracting...
running pre-upgrade script postgresql.sh
extracted [custom]
flashing boot...
boot flashed [custom]
preparing for reboot...
losetup: /data/.fwupdate/firmware.img: Resource temporarily unavailable
z oot@thing-115d74d5 ~]# fwupdate upgrade /tmp/thingos-raspberrypi-unknown.img.xz
upgrading to /tmp/thingos-raspberrypi-unknown.img.xz
downloading...
downloaded [custom]
extracting...
running pre-upgrade script postgresql.sh
extracted [custom]
flashing boot...
boot flashed [custom]
preparing for reboot...
rebooting...

---- shutting down thingOS unknown ----
 * Stopping sshd: done
 * Stopping crond: done
 * Stopping ntpd: done
 * Stopping netwatch: done
 * Stopping network: done
 * Stopping wpa_supplicant on wlan0: done
 * Stopping rngd: done
 * Stopping eudev: done
 * Stopping throttle watcher: done
 * Stopping syslogd: done
umount: overlay busy - remounted read-only
umount: can't remount tmpfs read-only
umount: devtmpfs busy - remounted read-only
The system is going down NOW!
Sent SIGTERM to all processes
rm: can't remove '/sbin/reboot': Read-only file system
Sent SIGKILL to all processes 
Re[  648.263369] reboot: Restarting system

My guess is that it is related to my comment here for Pinecube, i.e. kernel/busybox bug: https://github.com/ccrisan/thingos/pull/79#issuecomment-1258558979

sonicpp avatar Sep 27 '22 11:09 sonicpp

I remember seeing this issue myself a few times. So if I understood correctly, this is a kernel bug that has been fixed in 5.19, but with no perspectives for backports, right?

If that's indeed the case, what do we do about it in thingOS? Waiting for kernel 5.19+ on all supported boards is probably not a solution, is it?

ccrisan avatar Sep 27 '22 12:09 ccrisan

Well maybe it is not a bug, just an "unwanted behavior". I am not 100 % sure about this, but it sounds to me that kernel sometimes returned EAGAIN for some loop operations. Kernel 5.19 changed/improved loop driver so it should (always?) succeeded in this situation. It looks like for example for 5.15.y stable branch these patched were skipped.

For userspace, busybox implementation always fails when driver returns EAGAIN (it uses two separate system calls, LOOP_SET_FD and LOOP_SET_STATUS64). Util-linux uses more modern system call (LOOP_CONFIGURE) instead of these two, if it detects newer kernel. But even for those older system calls util-linux should work since version v2.37.1- when kernel returns EAGAIN, userspace should try again.

So for now there are IMO two options:

  • upgrade to newer kernel (backports are unlikely).
  • switch to util-linux implementation of losetup

Losetup never worked for me on Pinecube with kernel < 5.19 AND busybox. Updating kernel to 5.19 OR switching to util-linux implementation always worked. I have not tested the fix on rpi 0w yet tho, since it is more random than on pinecube. But I can give it a try.

sonicpp avatar Sep 27 '22 15:09 sonicpp

Fixed via https://github.com/ccrisan/thingos/commit/2059321a6cb271f31a4a4c2d0128455fb259ef17. Thanks for investigating and reporting this one!

ccrisan avatar Mar 27 '23 06:03 ccrisan