operating-system
operating-system copied to clipboard
external data disk gets disabled on HAOS update in UEFI (arm64) VM - machine_id found empty
Describe the issue you are experiencing
After the recent update to 12.1, the system got stuck during boot. This time I took time to fetch the journalctl and dmesg outputs.
My external data disk was labelled "hassos-data-dis". I was able to change it to "hassos-data" and reboot successfully.
Looking at existing issues, I thing the root cause is that the boot immediatly after the operating system update got an empty machine_id value:
homeassistant kernel: Kernel command line: BOOT_IMAGE=(hd0,gpt2)/Image root=PARTUUID=8d3d53e3-6d49-4c38-8349-aff6859e82fd rootwait zram.enabled=1 zram.num_devices=3 systemd.machine_id= fsck.repair=yes systemd.condition-first-boot=true console=tty1 console=ttyS0 rauc.slot=A
This caused the haos-data-disk-detach script to run:
homeassistant systemd[1]: Starting HAOS data disk detach...
and that got the data disk disabled.
It's the second time it happens (since I moved the data on a second disk), and might be related to the update procedure for UEFI systems. I see the new machine_id in the /mnt/boot/efi/boot/grubenv.
What operating system image do you use?
generic-aarch64 (Generic UEFI capable aarch64 systems)
What version of Home Assistant Operating System is installed?
Home Assistant OS 12.1
Did you upgrade the Operating System.
Yes
Steps to reproduce the issue
Started 12.0 -> 12.1 update.
Anything in the Supervisor logs that might be useful for us?
no
Anything in the Host logs that might be useful for us?
no
System information
System Information
| version | core-2024.3.0 |
|---|---|
| installation_type | Home Assistant OS |
| dev | false |
| hassio | true |
| docker | true |
| user | root |
| virtualenv | false |
| python_version | 3.12.2 |
| os_name | Linux |
| os_version | 6.6.20-haos |
| arch | aarch64 |
| timezone | America/New_York |
| config_dir | /config |
Home Assistant Community Store
| GitHub API | ok |
|---|---|
| GitHub Content | ok |
| GitHub Web | ok |
| GitHub API Calls Remaining | 4937 |
| Installed Version | 1.34.0 |
| Stage | running |
| Available Repositories | 1411 |
| Downloaded Repositories | 10 |
Home Assistant Cloud
| logged_in | true |
|---|---|
| subscription_expiration | November 27, 2024 at 7:00 PM |
| relayer_connected | true |
| relayer_region | us-east-1 |
| remote_enabled | true |
| remote_connected | true |
| alexa_enabled | false |
| google_enabled | true |
| remote_server | us-east-1-4.ui.nabu.casa |
| certificate_status | ready |
| instance_id | d135c865502f446fa7746e274daf1f76 |
| can_reach_cert_server | ok |
| can_reach_cloud_auth | ok |
| can_reach_cloud | ok |
Home Assistant Supervisor
| host_os | Home Assistant OS 12.1 |
|---|---|
| update_channel | stable |
| supervisor_version | supervisor-2024.03.0 |
| agent_version | 1.6.0 |
| docker_version | 24.0.7 |
| disk_total | 125.9 GB |
| disk_used | 23.7 GB |
| healthy | true |
| supported | true |
| board | generic-aarch64 |
| supervisor_api | ok |
| version_api | ok |
| installed_addons | ZeroTier One (0.18.0), Terminal & SSH (9.10.0), File editor (5.8.0), ESPHome (2024.2.2), Z-Wave JS (0.4.5), Matter Server (5.4.1) |
Dashboards
| dashboards | 1 |
|---|---|
| resources | 3 |
| views | 8 |
| mode | storage |
Recorder
| oldest_recorder_run | March 6, 2024 at 11:33 PM |
|---|---|
| current_recorder_run | March 13, 2024 at 7:01 PM |
| estimated_db_size | 268.23 MiB |
| database_engine | sqlite |
| database_version | 3.44.2 |
Additional information
No response
I suspect that I have the same bug on the RPi5 with SSD, but I can no longer access any logs. In my case, only a reinstallation with the implementation of the backup helped.
So this is not specific to "UEFI (arm64) VM" when others see this on Raspberry Pi's too, right?
@claplace I agree with your analysis, it seems that first boot got detected again.
What usually is the cause is if a internal SD card as well as a external SD card/disk have a full HAOS installation (with the boot partition etc.) on it. Then the system might write the boot information on a different disk than the boot loader reads things from.
Can you run lsblk from the terminal to check what disks and partitions there are?
Sure :)
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 32G 0 disk
|-sda1 8:1 0 32M 0 part /mnt/boot
|-sda2 8:2 0 24M 0 part
|-sda3 8:3 0 256M 0 part /
|-sda4 8:4 0 24M 0 part
|-sda5 8:5 0 256M 0 part
|-sda6 8:6 0 8M 0 part
|-sda7 8:7 0 96M 0 part /var/lib/systemd
| /var/lib/bluetooth
| /var/lib/NetworkManager
| /root/.ssh
| /root/.docker
| /etc/udev/rules.d
| /etc/systemd/timesyncd.conf
| /etc/modules-load.d
| /etc/modprobe.d
| /etc/hosts
| /etc/hostname
| /etc/dropbear
| /etc/NetworkManager/system-connections
| /mnt/overlay
`-sda8 8:8 0 31.3G 0 part
sdb 8:16 0 128G 0 disk
`-sdb1 8:17 0 128G 0 part /var/log/journal
/var/lib/docker
/mnt/data
zram0 253:0 0 0B 0 disk
zram1 253:1 0 32M 0 disk
zram2 253:2 0 16M 0 disk /tmp
also, fdisk -l /dev/sda:
Disk model: VMware Virtual S
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: BB18D444-28F2-4177-B30F-AB734183BA40
Device Start End Sectors Size Type
/dev/sda1 2048 67583 65536 32M EFI System
/dev/sda2 67584 116735 49152 24M Linux filesystem
/dev/sda3 116736 641023 524288 256M Linux filesystem
/dev/sda4 641024 690175 49152 24M Linux filesystem
/dev/sda5 690176 1214463 524288 256M Linux filesystem
/dev/sda6 1214464 1230847 16384 8M Linux filesystem
/dev/sda7 1230848 1427455 196608 96M Linux filesystem
/dev/sda8 1427456 67108830 65681375 31.3G Linux filesystem
and for /dev/sdb:
Disk /dev/sdb: 128 GiB, 137438953472 bytes, 268435456 sectors
Disk model: VMware Virtual S
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 97609BB7-B00A-4DBD-911B-DE58B5BC9100
Device Start End Sectors Size Type
/dev/sdb1 2048 268433407 268431360 128G Linux filesystem
Hi, just a note to tell I've hit the issue again with 12.2 update.
fixed with:
e2label /dev/sdb1 hassos-data
reboot
The update to 12.2 worked on the Pi 5 with PCIe SSD. Perhaps it was not the same error then. In my defense, I never got to the logs, I always had to reinstall, in my 3 tests.
Hi, just a note to tell I've hit the issue again with 12.2 update.
I've tried to reproduce this on generic-aarch64, but wasn't able to: For me the machine ID got saved, and on successive boot it was present in the cmdline.
@claplace can you check the logs of hassos-persists?
journalctl -u hassos-persists
And check if the GRUB environment is ok?
grub-editenv /mnt/boot/EFI/BOOT/grubenv list
There is a new HAOS update! Before updating, I used the commands above, the hassos-persists journal is empty, and as I don't know how to properly ssh into haos, here's a screenshot for the current grub env:
I started the update, and went back print the grub env again and again... and suddenly it went empty:
and sure did, after reboot, my data disk was disabled.
It happened again. So I went to understand how the HAOS update worked, and here's what I think I have understood.
HAOS is using RAUC, and the update is done from a rauc bundle. e.g. https://github.com/home-assistant/operating-system/releases/download/12.4/haos_generic-aarch64-12.4.raucb for the latest release.
Here's the bundle content:
$ ls -l
total 227656
-rw-r--r-- 1 cyprien cyprien 33554432 Jun 18 09:57 boot.vfat
-rwxr-xr-x 1 cyprien cyprien 4935 Jun 18 09:57 hook
-rw-r--r-- 1 cyprien cyprien 20287488 Jun 18 09:57 kernel.img
-rw-r--r-- 1 cyprien cyprien 521 Jun 18 09:57 manifest.raucm
-rw-r--r-- 1 cyprien cyprien 211886080 Jun 18 09:57 rootfs.img
boot.vfat contains the boot files that will replace the existing ones:
$ find boot
boot
boot/EFI
boot/EFI/BOOT
boot/EFI/BOOT/grub.cfg
boot/EFI/BOOT/bootaa64.efi
boot/EFI/BOOT/grubenv
boot/cmdline.txt
$ grub-editenv boot/EFI/BOOT/grubenv list
$
The grub environment is empty there. Now the bundle hook file contains a install_boot() function that replaces the existing boot files with the new one, making sure the *.txt files are restored (but I don't see any...).
# Backup boot config
cp -f "${BOOT_MNT}"/*.txt "${BOOT_TMP}/" || true
cp -rf "${BOOT_NEW}"/* "${BOOT_MNT}/"
# Restore boot config
cp -f "${BOOT_TMP}"/*.txt "${BOOT_MNT}/" || true
Then I believe rauc is rewriting the grubenv with the new boot order, but it does not know about MACHINE_ID key, that does not appear in https://github.com/rauc/rauc/blob/master/src/bootchooser.c.
After manually downloading and installing the update bundle rauc install haos_generic-aarch64-12.4.raucb, I see that the grubenv does not have the MACHINE_ID entry:
A simple line added in the hook file could restore it: