operating-system
operating-system copied to clipboard
HAOS 8.5 fails to start on Raspberry Pi CM4 non-lite with NVMe connected
Describe the issue you are experiencing
- HAOS 8.5 fails to start on Raspberry Pi CM4 non-lite with NVMe connected and shows blank screen when connected to monitor via HDMI.
- Disconnecting an NVMe drive fixes the issue.
- Latest Raspberry PI Lite works fine.
What operating system image do you use?
rpi4-64 (Raspberry Pi 4/400 64-bit OS)
What version of Home Assistant Operating System is installed?
8.5
Did you upgrade the Operating System.
No
Steps to reproduce the issue
- Use Waveshare CM4-IO-BASE-B base board
- Do not connect fan
- Do not insert RTC battery
- Plug RPI CM4 module (4Gb RAM, 8Gb eMMC, Wifi/Bluetooth) with latest stable firmware pieeprom-2022-08-02.bin
- Switch boot selector to USB-C (flashing) mode
- Insert SSD SSSTC CL1-3D256-Q1
- Flash
haos_rpi4-64-8.5.img.xzwith Balena Etcher - Switch boot selector to eMMC mode
- Turn board on with connected HDMI and USB keyboard
- Observe blank black screen and non-working keyboard
Anything in the Supervisor logs that might be useful for us?
Can't retrieve logs, as OS does not boot.
Anything in the Host logs that might be useful for us?
Can't retrieve logs, as OS does not boot.
System Health information
No response
Additional information
Expected: Board boots, displays boot progress and HA prompt on screen, USB keyboard works.
Actual: HDMI screen powers on, but remains blank black, USB keyboard does not work.
After removing NVMe board boots, but USB keyboard does not work. This is expected, and requires adding dtoverlay=dwc2,dr_mode=host to config.txt
I've tried solutions from https://github.com/home-assistant/operating-system/issues/1887 and https://github.com/home-assistant/operating-system/issues/1911:
- Replacing
u-boot.binwith extracted version from https://github.com/home-assistant/operating-system/files/8702067/u-boot.img.tar.gz - Adding
dtoverlay=dwc2,dr_mode=hosttoconfig.txt- this helped keyboard to start working if NVMe is not connected - Adding
earlycontocmdline.txt- nothing changed - Adding
device_tree=bcm2711-rpi-cm4.dtbtoconfig.txt- nothing changed
Two issues mentioned above were closed, and people moved on to Raspberry PI OS + Supervisor, but I want to make it work on HASSOS. That's why I've created this new issue. I am happy to help troubleshoot the issue, use UART console if necessary (but you'll have to teach me how :wink ) and try out different params.
Same setup (except that it is 32-bit) and issue here.
I get to the point where it says:
USB MSD timed out after 20 seconds
Restart 0 max -1
Failed to open device: 'sdcard' (cmd 371a0010 status 1fff0001)
Before update to 8.5 the setup worked fine without issues
I'm not sure it's the same issue, as I can reproduce it on HAOS 8.4
@bhessen also USB MSD sounds very much as if you connected your SSD via USB?
That's true. After further investigation it seems to be a hardware problem with the SSD drive which coincidentally happened exactly at the same time as the update...?!? In any case, the drive seems to be severely broken now and I cannot initialize it anymore (in windows)
@bhessen ok, thanks for confirming! I hope you have a somewhat recent backup :cold_sweat: In any case, your case is off topic in this issue as this issue is about a directly attached NVMe drive to a RPi CM4.
From what I understand, this is caused by a early crash of the kernel: https://github.com/home-assistant/operating-system/issues/1887#issuecomment-1184316490
This is likely caused by some weird interaction between RPi Firmware/U-Boot/Raspberry Pi Linux kernel. The problem is a bit surprising to me, as I don't see it on Yellow (and it uses CM4 + NVMe as well).
I don't have that particular hardware, so I can debug this problem really...
@agners I am eager to help :) I don't have any experience with OS debugging, but I'll do my best to help.
EDIT: I was wrong on this part below:
~~Also, just thinking out loud...
Latest Raspberry PI OS contains the following files on /boot partition:~~
overlays/dwc2.dtbo
bcm2711-rpi-cm4.dtb
~~However, HAOS does not.~~
~~These files were mentioned in possible workarounds:~~
dtoverlay=dwc2,dr_mode=host
device_tree=bcm2711-rpi-cm4.dtb
~~Might that be a part of issue?~~
@agners now I have some time to assist with this issue if you need some help.
I've managed to enable UART and extract logs from version 8.5 64bit, as well as both 32bit & 64bit of 9.0-rc1. Logs attached.
8.5_x64.txt 9.0rc1_32.txt 9.0rc1_64.txt
changes to files:
config.txt
enable_uart=1
device_tree=bcm2711-rpi-cm4.dtb
cmdline.txt
earlycon=uart8250,mmio32,0xfe215040
I tried just earlycon, but with that config nothing appeared in serial console.
I saw your comment about rpiboot: https://github.com/raspberrypi/usbboot/issues/36#issuecomment-1193639350
At first I was optimistic, and simple MacBook reboot helped me with that issue, but after some time this workaround stopped working. But it works great if CM4 is connected to Raspberry PI 4 (non-CM). This slows things down, as I have to switch from my main workstation to less comfortable Raspbian, but now I can easily flash new images.
So, if you have any ideas that I can try, please let me know. Or if you can suggest how can I bisect and troubleshoot this issue (maybe by comparing this repo with the specific commit from https://github.com/raspberrypi/linux repo? by building custom images locally? maybe some other approach?). I am total newbie in Linux kernel development, and haven't compiled Linux from scratch, haven't built the distro around the kernel... But if you have any tips, please let me know. I can invest some time into solving this issue, as I really want to use CM4 + M.2 NVMe to run HAOS natively (as opposed to supervised, containerized and other modes).
I would also be keen to be able to use an M2 NVME with my CM4 and CM4-IO-BASE, especially given the delays on the PoE version of the HA Yellow
I found a workaround that worked for me, so the issue is not that important for me anymore. I just flashed HAOS on NVMe and changed the boot priority. Now eMMC is not used anymore, and HAOS boots and works fine. However, the issue is not resolved, hence I keep it open.
Hm, so it seems if U-Boot initializes the NVMe, then the kernel subsequently is able to boot then? This is a bit curious I must say.
Maybe we just need to force NVMe initialization in U-Boot no matter what.
Btw, 9.0.rc2 and later have an updated Raspberry Pi Kernel and firmware, so it might be worth a try as well.
@agners I've verified on 9.0.rc2, but had the same issue as before. And by that time I changed boot priority and flashed 8.5-64 on NVMe. If you need logs from UART on NVMe - I'll be happy to help, however now that I've migrated HAOS to NVMe I won't be able to play with eMMC, as I need a stable smart home controller. I plant to purchase another Waveshare base board, then I could be more helpful again.
This is essentially a duplicate of #1887. Since #1887 has more information, let's continue discussion there.