operating-system icon indicating copy to clipboard operation
operating-system copied to clipboard

Upgrade HAOS from 12.0 to 12.2 on RPi4 results in unbootable system

Open asjp opened this issue 1 year ago • 2 comments
trafficstars

Describe the issue you are experiencing

Performed an upgrade of HAOS via the HA GUI from 12.0 (installed, working) to 12.2 (available) on Raspberry Pi 4

After the upgrade when the system restarted it failed to come back up.

After inspecting the output from the bootloader I could see failed attempts to boot with the errors:

Bad cluster number 0
Firmware not found

After multiple attempts, eventually it stops with

Unexpected error @ 0x000a95ba
FATAL error-code 45

What operating system image do you use?

rpi4-64 (Raspberry Pi 4/400 64-bit OS)

What version of Home Assistant Operating System is installed?

12.2

Did you upgrade the Operating System.

Yes

Steps to reproduce the issue

  1. Start with working HA running HAOS 12.0 on RPi4
  2. Install OS upgrade version 12.2
  3. Observe failures upon reboot

Anything in the Supervisor logs that might be useful for us?

Unable to access logs

Anything in the Host logs that might be useful for us?

Unable to access logs

System information

No response

Additional information

No response

asjp avatar Apr 26 '24 14:04 asjp

I managed to restore my system. Here's what I did

  1. Image a new SD card with a clean HA install (OS 12.1)
  2. start up HA using the new card on the RPi, let it finish its configuration process until it gets to the Create User screen in the UI, then shut it down
  3. insert both SD cards in a Ubuntu PC, identify which partitions are the hassos-data partition on each card
  4. copy data partition from the old SD card to the new one using dd if=/dev/sdb8 of=/dev/sdc8 (where sdb8 is the old hassos-data partition and sdc8 is the new one) (this took around 3 hrs)
  5. insert new SD card in RPi and system is back up and running with all previous data intact

Obviously I was careful to ensure I was copying the correct partitions, in the right direction. One easy way to tell was that the old partition had a rauc.db file since it had been previously upgraded, whereas the new one did not have that file.

~~Now the final step is to re-add Lovelace add-ons that are missing from the new install, as these don't appear to be part of the data that was copied across.~~ these appeared after a short while,

I would still be interested to know what caused the failure in the first place, and if there is an easier way to restore the system.

asjp avatar Apr 27 '24 00:04 asjp

the same thing happened to me, I upgraded rpi4 to 12.3.rc1 and this happened

yousaf465 avatar Apr 27 '24 14:04 yousaf465

@asjp Great, glad you figured it out! The SD card corruption can happen randomly during the SD card (or any NAND flash) life, and while it is most common when the device suffers a power loss or is worn out by excessive writes, it may happen even during reading (see "read disturb"). This actually happens more than you may think, but in most cases the controller embedded in the SD card is able to cope with that using its error correction mechanism. But in some cases, despite you giving the SD card the best care by limiting writes and avoiding sudden power loss, errors that can't be corrected occur and you get corrupted data.

Your method is generally correct, you could just run into issues if the data partition on the old card were larger than on the new one. Otherwise there are only two partitions that contain user data - the hassos-data and the hassos-overlay in partition 7, however the latter only contains mostly operating system configuration and it's usually fine if you don't retain it when migrating (the full HA backup doesn't do that too).

In your case, the system might have been restored only by copying the partition 1 (e.g. /dev/sdb1) from a fresh installation to the old SD card, or even by copying the content of the filesystem of the boot partition. Following this, I would do a backup and restore on a new and more trustworthy SD card. Also, when copying to an SD card using dd, it's worth increasing the block size by adding the bs=4M, as it makes the writes to be much more effective and faster.

Anyway, this is unfortunately a thing that we can't do much about. What you can do is to pick a high quality SD card and make sure you don't abuse it. Also you can use an external drive as a data disk (note this is not the same as using an external drive for the whole HAOS install). This way you can quite simply replace the SD card with the system and keep using the same data disk without any complicated recovery steps if the SD card breaks.

sairon avatar Apr 29 '24 07:04 sairon