operating-system
operating-system copied to clipboard
HAOS 11.4 upgrade failing on HA Yellow with Docker error: "Failed to initialize nft"
Describe the issue you are experiencing
I've been unable to update my HA Yellow (Raspberry Pi CM4) from HAOS 11.3 to 11.4. Each time I run the update, the system fails to complete the 11.4 boot and then fails back to the 11.3 boot slot. I originally thought this was related to #2870, but digging into it some more, it seems to be a unique issue. See full logs before, but looking at the failed boot logs, it seems to be docker is failing to start due to a iptables -t nat -N DOCKER: iptables: Failed to initialize nft: Protocol not supported error:
Jan 15 06:40:49 ha systemd[1]: Starting Docker Application Container Engine...
Jan 15 06:40:50 ha dockerd[468]: time="2024-01-15T06:40:50.044616804Z" level=info msg="Starting up"
Jan 15 06:40:50 ha dockerd[468]: time="2024-01-15T06:40:50.045878137Z" level=warning msg="Running experimental build"
Jan 15 06:40:50 ha audit[478]: AVC apparmor="STATUS" operation="profile_load" profile="unconfined" name="docker-default" pid=478 comm="apparmor_parser"
Jan 15 06:40:50 ha audit[478]: SYSCALL arch=c00000b7 syscall=64 success=yes exit=8369 a0=4 a1=556f2adc80 a2=20b1 a3=1 items=0 ppid=477 pid=478 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="apparmor_parser" exe="/usr/sbin/apparmor_parser" subj=unconfined key=(null)
Jan 15 06:40:50 ha audit: PROCTITLE proctitle=61707061726D6F725F706172736572002D4B72002F6D6E742F646174612F646F636B65722F746D702F646F636B65722D64656661756C7434303733333334353835
Jan 15 06:40:50 ha kernel: audit: type=1400 audit(1705300850.108:15): apparmor="STATUS" operation="profile_load" profile="unconfined" name="docker-default" pid=478 comm="apparmor_parser"
Jan 15 06:40:50 ha kernel: audit: type=1300 audit(1705300850.108:15): arch=c00000b7 syscall=64 success=yes exit=8369 a0=4 a1=556f2adc80 a2=20b1 a3=1 items=0 ppid=477 pid=478 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="apparmor_parser" exe="/usr/sbin/apparmor_parser" subj=unconfined key=(null)
Jan 15 06:40:50 ha dockerd[468]: time="2024-01-15T06:40:50.163582137Z" level=info msg="[graphdriver] trying configured driver: overlay2"
Jan 15 06:40:50 ha systemd[1]: var-lib-docker-overlay2-metacopy\x2dcheck3120273736-merged.mount: Deactivated successfully.
Jan 15 06:40:50 ha systemd[1]: mnt-data-docker-overlay2-metacopy\x2dcheck3120273736-merged.mount: Deactivated successfully.
Jan 15 06:40:50 ha dockerd[468]: time="2024-01-15T06:40:50.454632970Z" level=info msg="Loading containers: start."
Jan 15 06:40:50 ha dockerd[468]: time="2024-01-15T06:40:50.532171767Z" level=info msg="unable to detect if iptables supports xlock: 'iptables --wait -L -n': `iptables: Failed to initialize nft: Protocol not supported`" error="exit status 1"
Jan 15 06:40:50 ha dockerd[468]: time="2024-01-15T06:40:50.704662859Z" level=info msg="stopping event stream following graceful shutdown" error="<nil>" module=libcontainerd namespace=moby
Jan 15 06:40:50 ha dockerd[468]: time="2024-01-15T06:40:50.706588729Z" level=info msg="stopping event stream following graceful shutdown" error="context canceled" module=libcontainerd namespace=plugins.moby
Jan 15 06:40:50 ha dockerd[468]: failed to start daemon: Error initializing network controller: error obtaining controller instance: failed to create NAT chain DOCKER: iptables failed: iptables -t nat -N DOCKER: iptables: Failed to initialize nft: Protocol not supported
Jan 15 06:40:50 ha dockerd[468]: (exit status 1)
Jan 15 06:40:50 ha systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Jan 15 06:40:50 ha docker-failure[521]: Docker exited with exit status 1, this might be caused by corrupted key.json.
Jan 15 06:40:50 ha docker-failure[522]: stat: can't stat '/mnt/overlay/etc/docker/key.json': No such file or directory
Jan 15 06:40:50 ha docker-failure[521]: key.json: bytes
Jan 15 06:40:50 ha docker-failure[521]: /usr/libexec/docker-failure: line 7: can't open /mnt/overlay/etc/docker/key.json: no such file
Jan 15 06:40:50 ha docker-failure[521]: key.json appears to be corrupted, it is not parsable. Removing it.
Jan 15 06:40:50 ha systemd[1]: docker.service: Failed with result 'exit-code'.
Jan 15 06:40:50 ha systemd[1]: Failed to start Docker Application Container Engine.
Jan 15 06:40:50 ha systemd[1]: Dependency failed for HassOS supervisor.
Jan 15 06:40:50 ha systemd[1]: hassos-supervisor.service: Job hassos-supervisor.service/start failed with result 'dependency'.
See full logs below.
What operating system image do you use?
yellow
What version of Home Assistant Operating System is installed?
11.4
Did you upgrade the Operating System.
Yes
Steps to reproduce the issue
- From a terminal on the HA Yellow running HAOS 11.3, run
ha os update - Wait for the update to complete successfully and for the system to reboot
- On reboot, watch the serial terminal and observe the boot process failing at the docker startup step with the "Failed to initialize nft: Protocol not supported" error.
- Wait for the boot process to fail three times, at which point the system will switch back over to the 11.3 bootslot and boot correctly.
Anything in the Supervisor logs that might be useful for us?
Failed docker log:
Jan 15 06:40:49 ha systemd[1]: Starting Docker Application Container Engine...
Jan 15 06:40:50 ha dockerd[468]: time="2024-01-15T06:40:50.044616804Z" level=info msg="Starting up"
Jan 15 06:40:50 ha dockerd[468]: time="2024-01-15T06:40:50.045878137Z" level=warning msg="Running experimental build"
Jan 15 06:40:50 ha dockerd[468]: time="2024-01-15T06:40:50.163582137Z" level=info msg="[graphdriver] trying configured driver: overlay2"
Jan 15 06:40:50 ha dockerd[468]: time="2024-01-15T06:40:50.454632970Z" level=info msg="Loading containers: start."
Jan 15 06:40:50 ha dockerd[468]: time="2024-01-15T06:40:50.532171767Z" level=info msg="unable to detect if iptables supports xlock: 'iptables --wait -L -n': `iptables: Failed to initialize nft: Protocol not supported`" error="exit status 1"
Jan 15 06:40:50 ha dockerd[468]: time="2024-01-15T06:40:50.704662859Z" level=info msg="stopping event stream following graceful shutdown" error="<nil>" module=libcontainerd namespace=moby
Jan 15 06:40:50 ha dockerd[468]: time="2024-01-15T06:40:50.706588729Z" level=info msg="stopping event stream following graceful shutdown" error="context canceled" module=libcontainerd namespace=plugins.moby
Jan 15 06:40:50 ha dockerd[468]: failed to start daemon: Error initializing network controller: error obtaining controller instance: failed to create NAT chain DOCKER: iptables failed: iptables -t nat -N DOCKER: iptables: Failed to initialize nft: Protocol not supported
Jan 15 06:40:50 ha dockerd[468]: (exit status 1)
Jan 15 06:40:50 ha systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Jan 15 06:40:50 ha docker-failure[521]: Docker exited with exit status 1, this might be caused by corrupted key.json.
Jan 15 06:40:50 ha docker-failure[522]: stat: can't stat '/mnt/overlay/etc/docker/key.json': No such file or directory
Jan 15 06:40:50 ha docker-failure[521]: key.json: bytes
Jan 15 06:40:50 ha docker-failure[521]: /usr/libexec/docker-failure: line 7: can't open /mnt/overlay/etc/docker/key.json: no such file
Jan 15 06:40:50 ha docker-failure[521]: key.json appears to be corrupted, it is not parsable. Removing it.
Jan 15 06:40:50 ha systemd[1]: docker.service: Failed with result 'exit-code'.
Jan 15 06:40:50 ha systemd[1]: Failed to start Docker Application Container Engine.
Anything in the Host logs that might be useful for us?
I'll attach full boot logs in a comment.
System information
System Information
| version | core-2023.11.3 |
|---|---|
| installation_type | Home Assistant OS |
| dev | false |
| hassio | true |
| docker | true |
| user | root |
| virtualenv | false |
| python_version | 3.11.6 |
| os_name | Linux |
| os_version | 6.1.63-haos-raspi |
| arch | aarch64 |
| timezone | America/Denver |
| config_dir | /config |
Home Assistant Community Store
| GitHub API | ok |
|---|---|
| GitHub Content | ok |
| GitHub Web | ok |
| GitHub API Calls Remaining | 4939 |
| Installed Version | 1.33.0 |
| Stage | running |
| Available Repositories | 1376 |
| Downloaded Repositories | 8 |
Home Assistant Cloud
| logged_in | true |
|---|---|
| subscription_expiration | May 22, 2024 at 6:00 PM |
| relayer_connected | true |
| relayer_region | us-east-1 |
| remote_enabled | true |
| remote_connected | true |
| alexa_enabled | false |
| google_enabled | true |
| remote_server | us-east-1-7.ui.nabu.casa |
| certificate_status | ready |
| can_reach_cert_server | ok |
| can_reach_cloud_auth | ok |
| can_reach_cloud | ok |
Home Assistant Supervisor
| host_os | Home Assistant OS 11.3 |
|---|---|
| update_channel | stable |
| supervisor_version | supervisor-2023.12.0 |
| agent_version | 1.6.0 |
| docker_version | 24.0.7 |
| disk_total | 457.7 GB |
| disk_used | 15.0 GB |
| healthy | true |
| supported | true |
| board | yellow |
| supervisor_api | ok |
| version_api | ok |
| installed_addons | Let's Encrypt (5.0.9), NGINX Home Assistant SSL proxy (3.7.0), File editor (5.7.0), Terminal & SSH (9.8.1), Z-Wave JS UI (3.1.0), Uptime Kuma (0.12.0), AWNET to HASS (1.0.1), ESPHome (2023.12.5), Silicon Labs Flasher (0.2.0) |
Dashboards
| dashboards | 4 |
|---|---|
| resources | 0 |
| views | 3 |
| mode | storage |
Recorder
| oldest_recorder_run | January 5, 2024 at 1:42 AM |
|---|---|
| current_recorder_run | January 14, 2024 at 11:42 PM |
| estimated_db_size | 1047.00 MiB |
| database_engine | sqlite |
| database_version | 3.41.2 |
Additional information
No response
Full failed boot log: last-boot-full.log Failed boot serial terminal output: ha_boot_11.4_failed.txt
Hm, I am running HAOS 11.4 successfully here on two Yellow's, so this must be something related to your particular instance/installation.
The best which comes to mind is that the kernel gets loaded from a different partition then what is mounted later on as rootfs. The failed log support this theory, in your case the kernel is built on Dec 4:
[ 0.000000] Linux version 6.1.58-haos-raspi (builder@5b1a6501bc4e) (aarch64-buildroot-linux-gnu-gcc.br_real (Buildroot -g2b699621) 11.4.0, GNU ld (GNU Binutils) 2.38) #1 SMP PREEMPT Mon Dec 4 15:51:47 UTC 023
While my 11.4 installation show a kernel build in January:
[ 0.000000] Linux version 6.1.63-haos-raspi (builder@13ed6d6d8021) (aarch64-buildroot-linux-gnu-gcc.br_real (Buildroot -g2d89a0f9) 11.4.0, GNU ld (GNU Binutils) 2.38) #1 SMP PREEMPT Tue Jan 9 10:42:51 UTC 2024
Did you install HAOS directly to the NVMe? In that case it could be that you have a stale installation on your eMMC, and the system is now mixing up the two.
I'd recommend to take a full backup and download it as long as your system comes up. From there a reinstall using Option 1 documented https://yellow.home-assistant.io/guides/reinstall-os/ is probably the best choice. It makes sure the eMMC is cleared. If you want your system to boot from NVMe, make sure to press the Blue button in step 9.
Yes, this is an nvme-based install. Happy to try a reinstall, but it seems like there's still a bug here if this breaks for anyone using an nvme install. Any further debugging I should do to try to debug this prior to reinstalling?
On Mon, Jan 15, 2024, 05:07 Stefan Agner @.***> wrote:
Hm, I am running HAOS 11.4 successfully here on two Yellow's, so this must be something related to your particular instance/installation.
The best which comes to mind is that the kernel gets loaded from a different partition then what is mounted later on as rootfs. The failed log support this theory, in your case the kernel is built on Dec 4:
[ 0.000000] Linux version 6.1.58-haos-raspi @.***) (aarch64-buildroot-linux-gnu-gcc.br_real (Buildroot -g2b699621) 11.4.0, GNU ld (GNU Binutils) 2.38) #1 SMP PREEMPT Mon Dec 4 15:51:47 UTC 023
While my 11.4 installation show a kernel build in January:
[ 0.000000] Linux version 6.1.63-haos-raspi @.***) (aarch64-buildroot-linux-gnu-gcc.br_real (Buildroot -g2d89a0f9) 11.4.0, GNU ld (GNU Binutils) 2.38) #1 SMP PREEMPT Tue Jan 9 10:42:51 UTC 2024
Did you install HAOS directly to the NVMe? In that case it could be that you have a stale installation on your eMMC, and the system is now mixing up the two.
I'd recommend to take a full backup and download it as long as your system comes up. From there a reinstall using Option 1 documented https://yellow.home-assistant.io/guides/reinstall-os/ is probably the best choice. It makes sure the eMMC is cleared. If you want your system to boot from NVMe, make sure to press the Blue button in step 9 https://yellow.home-assistant.io/power-supply/#connecting-the-power-supply .
— Reply to this email directly, view it on GitHub https://github.com/home-assistant/operating-system/issues/3074#issuecomment-1892044589, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACLHPIFDQLM4CVI5R4MZP3YOULWLAVCNFSM6AAAAABB2ZRYASVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJSGA2DINJYHE . You are receiving this because you authored the thread.Message ID: @.***>
One of my two Yellow installation is an NVMe install, and that did not break. It is related to a "stale" install on your eMMC, which HAOS doesn't like much. The problem is that we rely on PARTUUID to find the correct root partition, and this is static in every install.
Looking at the logs again, this snippet also really shows that case:
Apr 04 10:55:23 homeassistant kernel: printk: console [ttyAMA2] enabled
Apr 04 10:55:23 homeassistant kernel: nvme0n1: p1 p2 p3 p4 p5 p6 p7 p8
Apr 04 10:55:23 homeassistant kernel: mmc0: new DDR MMC card at address 0001
Apr 04 10:55:23 homeassistant kernel: mmc1: new high speed SDIO card at address 0001
Apr 04 10:55:23 homeassistant kernel: mmcblk0: mmc0:0001 BJTD4R 29.1 GiB
Apr 04 10:55:23 homeassistant kernel: printk: console [netcon0] enabled
Apr 04 10:55:23 homeassistant kernel: mmcblk0: p1 p2 p3 p4 p5 p6 p7 p8
Our update system as well as U-Boot relies on the partition UUID and labels to be unique in the system. So this setup is bound to cause issue long term. What we probably could do is check the partition from Supervisor side and warn the user if this type of setup is found. :thinking:
@asayler if you have serial console access, the output of this command would actually be interesting:
blkid --match-token PARTLABEL="hassos-system0" --output device
Also, you can check which partition gets used as data partition using:
findmnt /mnt/data/
It seems the system boots from NVMe, so if you are certain your data is on NVMe too, you can try to fix the problem "manually" by deleting the eMMC installation
:warning: this might break your system, make a backup before!
sync
blkdiscard /dev/mmcblk0
dd if=/dev/zero of=/dev/mmcblk0 bs=1M count=32
reboot -f
Thanks, @agners. Here's the result of those commands:
# blkid --match-token PARTLABEL="hassos-system0" --output device
/dev/nvme0n1p3
/dev/mmcblk0p3
# blkid --match-token PARTLABEL="hassos-data" --output device
/dev/nvme0n1p8
/dev/mmcblk0p8
# findmnt /mnt/data/
TARGET SOURCE FSTYPE OPTIONS
/mnt/data /dev/nvme0n1p8 ext4 rw,relatime,commit=30
# ls -al /dev/disk/by-partlabel/
total 0
drwxr-xr-x 2 root root 200 Jan 15 07:53 .
drwxr-xr-x 9 root root 180 Apr 4 2023 ..
lrwxrwxrwx 1 root root 15 Jan 15 07:53 hassos-boot -> ../../nvme0n1p1
lrwxrwxrwx 1 root root 15 Jan 15 07:53 hassos-bootstate -> ../../nvme0n1p6
lrwxrwxrwx 1 root root 15 Jan 15 07:53 hassos-data -> ../../mmcblk0p8
lrwxrwxrwx 1 root root 15 Jan 15 07:53 hassos-kernel0 -> ../../nvme0n1p2
lrwxrwxrwx 1 root root 15 Jan 15 07:53 hassos-kernel1 -> ../../nvme0n1p4
lrwxrwxrwx 1 root root 15 Jan 15 07:53 hassos-overlay -> ../../nvme0n1p7
lrwxrwxrwx 1 root root 15 Jan 15 07:53 hassos-system0 -> ../../nvme0n1p3
lrwxrwxrwx 1 root root 15 Jan 15 07:53 hassos-system1 -> ../../nvme0n1p5
So it does seem like there are conflicting partition labels on the nvme and mmc. I was pretty sure I followed the directions you note when I set this up initially to force the nvme install on a fresh CM4, but it's possible I completed an install onto the eMMC prior to doing that (this was installed about 10 months ago, so I've forgotten the details). I wonder why the install scripts failed to wipe the eMMC when performing the nvme install? Seems like that was maybe the original issue here.
I'll attempt to wipe the eMMC partitions shortly (although I assume I could also just change their partition labels to avoid the naming conflict) to see if that resolves the issue. If that fails, I'll do a full reinstall.
And just to give the full picture, data seems to be on nvme, but the overlay is running off eMMC:
# lsblk
\NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
mmcblk0 179:0 0 29.1G 0 disk
|-mmcblk0p1 179:1 0 32M 0 part /mnt/boot
|-mmcblk0p2 179:2 0 24M 0 part
|-mmcblk0p3 179:3 0 256M 0 part
|-mmcblk0p4 179:4 0 24M 0 part
|-mmcblk0p5 179:5 0 256M 0 part
|-mmcblk0p6 179:6 0 8M 0 part
|-mmcblk0p7 179:7 0 96M 0 part /var/lib/systemd
| /var/lib/bluetooth
| /var/lib/NetworkManager
| /etc/systemd/timesyncd.conf
| /etc/hosts
| /etc/hostname
| /etc/NetworkManager/system-connections
| /root/.ssh
| /root/.docker
| /etc/udev/rules.d
| /etc/modules-load.d
| /etc/modprobe.d
| /etc/dropbear
| /mnt/overlay
`-mmcblk0p8 179:8 0 28.4G 0 part
mmcblk0boot0 179:32 0 4M 1 disk
mmcblk0boot1 179:64 0 4M 1 disk
zram0 254:0 0 0B 0 disk
zram1 254:1 0 32M 0 disk
zram2 254:2 0 16M 0 disk /tmp
nvme0n1 259:0 0 465.8G 0 disk
|-nvme0n1p1 259:1 0 32M 0 part
|-nvme0n1p2 259:2 0 24M 0 part
|-nvme0n1p3 259:3 0 256M 0 part
|-nvme0n1p4 259:4 0 24M 0 part
|-nvme0n1p5 259:5 0 256M 0 part /
|-nvme0n1p6 259:6 0 8M 0 part
|-nvme0n1p7 259:7 0 96M 0 part
`-nvme0n1p8 259:8 0 465.1G 0 part /var/log/journal
/var/lib/docker
/mnt/data
# mount | grep mmc
/dev/mmcblk0p1 on /mnt/boot type vfat (rw,relatime,sync,fmask=0022,dmask=0022,codepage=437,iocharset=ascii,shortname=mixed,errors=remount-ro)
/dev/mmcblk0p7 on /mnt/overlay type ext4 (rw,relatime)
/dev/mmcblk0p7 on /etc/dropbear type ext4 (rw,relatime)
/dev/mmcblk0p7 on /etc/modprobe.d type ext4 (rw,relatime)
/dev/mmcblk0p7 on /etc/modules-load.d type ext4 (rw,relatime)
/dev/mmcblk0p7 on /etc/udev/rules.d type ext4 (rw,relatime)
/dev/mmcblk0p7 on /root/.docker type ext4 (rw,relatime)
/dev/mmcblk0p7 on /root/.ssh type ext4 (rw,relatime)
/dev/mmcblk0p7 on /etc/NetworkManager/system-connections type ext4 (rw,relatime)
/dev/mmcblk0p7 on /etc/hostname type ext4 (rw,relatime)
/dev/mmcblk0p7 on /etc/hosts type ext4 (rw,relatime)
/dev/mmcblk0p7 on /etc/systemd/timesyncd.conf type ext4 (rw,relatime)
/dev/mmcblk0p7 on /var/lib/NetworkManager type ext4 (rw,relatime)
/dev/mmcblk0p7 on /var/lib/bluetooth type ext4 (rw,relatime)
/dev/mmcblk0p7 on /var/lib/systemd type ext4 (rw,relatime)
# mount | grep nvme
/dev/nvme0n1p5 on / type squashfs (ro,relatime,errors=continue)
/dev/nvme0n1p8 on /mnt/data type ext4 (rw,relatime,commit=30)
/dev/nvme0n1p8 on /var/lib/docker type ext4 (rw,relatime,commit=30)
/dev/nvme0n1p8 on /var/log/journal type ext4 (rw,relatime,commit=30)
I'm running HAOS on an Intel NUC. Everything was running well (11.3) until the upgrade to 11.4, after which I could not access the homeassistant.local port. Viewing the NUC locally revealed a cifs_mount failed w/return code = -101 error, which means the OS starts before the network connection is established. Tried several 11.4 reinstalls to no avail - same issue after each reboot. Note that HAOS 11.4 appears to boot correctly but without network access. I did some research and saw suggestions about disabling IPV6, or modifying config files to add a delay to wait for an established network, but I've never had to worry about that prior to 11.4, and don't need want to worry about it now. I'll wait for the HA team 11.4.1 release which fixes their mistake. I reinstalled 11.3 and restored my last daily backup. It's nice to be back in Home Automation Heaven...
@dwgtx your case seems a networking issue on your particular platform with HAOS 11.4. Can you open a new issue along with detailed information about your system (model number of your hardware, network card information).
@asayler
I wonder why the install scripts failed to wipe the eMMC when performing the nvme install? Seems like that was maybe the original issue here.
Yeah the installer indeed should wipe the eMMC: https://github.com/NabuCasa/buildroot-installer/blob/2022.02.x-yellow-installer/rootfs-overlay/usr/bin/haos-flash#L43-L48
I was assuming that you pulled out the NVMe and installed direclty (or used the rpiboot method to expose the NVMe to as mass storage device to your computer and flashed it directly or something).
I'll attempt to wipe the eMMC partitions shortly (although I assume I could also just change their partition labels to avoid the naming conflict) to see if that resolves the issue. If that fails, I'll do a full reinstall.
You'd have to change the partition label of all of them. Also at boot we use the UUID, so you'd have to change that too. Just removing the whole partition table is really the easy way out here. :sweat_smile:
I went ahead and wiped the partition table on the eMMC storage (the blkdiscard command failed since the device was mounted/busy but the dd coupled with a reboot worked). The system came back up on 11.4 and now seems to be (correctly) mounting all storage off the nvme device:
# mount | grep mmc
# mount | grep nvm
/dev/nvme0n1p3 on / type squashfs (ro,relatime,errors=continue)
/dev/nvme0n1p1 on /mnt/boot type vfat (rw,relatime,sync,fmask=0022,dmask=0022,codepage=437,iocharset=ascii,shortname=mixed,errors=remount-ro)
/dev/nvme0n1p7 on /mnt/overlay type ext4 (rw,relatime)
/dev/nvme0n1p7 on /etc/dropbear type ext4 (rw,relatime)
/dev/nvme0n1p7 on /etc/modprobe.d type ext4 (rw,relatime)
/dev/nvme0n1p7 on /etc/modules-load.d type ext4 (rw,relatime)
/dev/nvme0n1p7 on /etc/udev/rules.d type ext4 (rw,relatime)
/dev/nvme0n1p7 on /root/.docker type ext4 (rw,relatime)
/dev/nvme0n1p7 on /root/.ssh type ext4 (rw,relatime)
/dev/nvme0n1p7 on /etc/NetworkManager/system-connections type ext4 (rw,relatime)
/dev/nvme0n1p7 on /etc/hostname type ext4 (rw,relatime)
/dev/nvme0n1p7 on /etc/hosts type ext4 (rw,relatime)
/dev/nvme0n1p7 on /etc/systemd/timesyncd.conf type ext4 (rw,relatime)
/dev/nvme0n1p8 on /mnt/data type ext4 (rw,relatime,commit=30)
/dev/nvme0n1p7 on /var/lib/NetworkManager type ext4 (rw,relatime)
/dev/nvme0n1p7 on /var/lib/bluetooth type ext4 (rw,relatime)
/dev/nvme0n1p8 on /var/lib/docker type ext4 (rw,relatime,commit=30)
/dev/nvme0n1p7 on /var/lib/systemd type ext4 (rw,relatime)
/dev/nvme0n1p8 on /var/log/journal type ext4 (rw,relatime,commit=30)
There do still seem to be two mmc boot devices present, not sure if that's expected or not:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
mmcblk0 179:0 0 29.1G 0 disk
mmcblk0boot0 179:32 0 4M 1 disk
mmcblk0boot1 179:64 0 4M 1 disk
zram0 254:0 0 0B 0 disk
zram1 254:1 0 32M 0 disk
zram2 254:2 0 16M 0 disk /tmp
nvme0n1 259:0 0 465.8G 0 disk
|-nvme0n1p1 259:1 0 32M 0 part /mnt/boot
|-nvme0n1p2 259:2 0 24M 0 part
|-nvme0n1p3 259:3 0 256M 0 part /
|-nvme0n1p4 259:4 0 24M 0 part
|-nvme0n1p5 259:5 0 256M 0 part
|-nvme0n1p6 259:6 0 8M 0 part
|-nvme0n1p7 259:7 0 96M 0 part /var/lib/systemd
| /var/lib/bluetooth
| /var/lib/NetworkManager
| /etc/systemd/timesyncd.conf
| /etc/hosts
| /etc/hostname
| /etc/NetworkManager/system-connections
| /root/.ssh
| /root/.docker
| /etc/udev/rules.d
| /etc/modules-load.d
| /etc/modprobe.d
| /etc/dropbear
| /mnt/overlay
`-nvme0n1p8 259:8 0 465.1G 0 part /var/log/journal
/var/lib/docker
/mnt/data
But otherwise things now seem to be functioning correctly.
I'll likely still do a full reinstall soon since I want to swap out the CM4 with a higher RAM version and replace the SSD while I'm at it. But I do at least seem to be back to a stable boot state. Not sure if you want to keep this ticket open to add any protections against this edge case going forward, but I appreciate the insight into what was going on here.
There hasn't been any activity on this issue recently. To keep our backlog manageable we have to clean old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant OS version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.
Closing as it is resolved.