firmware-upgrade fails (with brick) on 32MB RAM devices
Bug report
What is the problem? Recently I upgraded a NSM2 (32MB RAM) from v1.0.2 to a newer v1.0.x and the device did not boot up anymore. With console I could see that the written squashfs was invalid. Reflashing to v1.0.2 via tftp fixed the device and restarting the sysupgrade was also successful.
What is the expected behaviour? A simple sysupgrade should not brick the device, esp. when it's mounted on a pole. As I have seen a similar thing on a WD1043-v1 (also 32MB) my assumption is the OOM-killer. I assume when commencing the sysupgrade most services gets stopped. This includes both olsrd-versions, which frees a lot of RAM, and the cron. There is a chance of a race condition:
- olsrd stops
- olsrd-watchdog is fired from cron before cron is stopped
- olsrd-restarts olsrd (which takes up RAM again)
- sysupgrade gets unpacked to be flashed
In such sequence we have 2 times the sysupgrade-file in RAM (the file in /tmp, the unpacked version in RAM during pipe to mtd tool) and olsr running again. This situation has a high chance to cause an OOM during flashing. I've seen some devices starting to reboot after the sysupgrade-file was loaded to /tmp (ram-disk), which is also an indication for this OOM-thing.
Firmware Version: flashing from v1.0.2, but I assume it affect all v1.0.0+ releases
Site Configuration: Not sure on the size of the backup-file, but isn't this also kept in RAM during flashing, which adds to the RAM-shortage, too.
order of services shutdown during sysupgrade:
Sending TERM to remaining processes ... logd rpcd netifd odhcpd dnsmasq uhttpd collectd ntpd olsrd crond vnstatd olsrd ubusd askfirst
i have experienced the same behavior on an XW-NSM2.
but the XW-version is 64MB, right? So it should be not a RAM-issue on these boards.