operating-system icon indicating copy to clipboard operation
operating-system copied to clipboard

HAOS 8.5 fails to start on Raspberry Pi CM4 non-lite with NVMe connected

Open maxromanovsky opened this issue 3 years ago • 13 comments
trafficstars

Describe the issue you are experiencing

  • HAOS 8.5 fails to start on Raspberry Pi CM4 non-lite with NVMe connected and shows blank screen when connected to monitor via HDMI.
  • Disconnecting an NVMe drive fixes the issue.
  • Latest Raspberry PI Lite works fine.

What operating system image do you use?

rpi4-64 (Raspberry Pi 4/400 64-bit OS)

What version of Home Assistant Operating System is installed?

8.5

Did you upgrade the Operating System.

No

Steps to reproduce the issue

  1. Use Waveshare CM4-IO-BASE-B base board
  2. Do not connect fan
  3. Do not insert RTC battery
  4. Plug RPI CM4 module (4Gb RAM, 8Gb eMMC, Wifi/Bluetooth) with latest stable firmware pieeprom-2022-08-02.bin
  5. Switch boot selector to USB-C (flashing) mode
  6. Insert SSD SSSTC CL1-3D256-Q1
  7. Flash haos_rpi4-64-8.5.img.xz with Balena Etcher
  8. Switch boot selector to eMMC mode
  9. Turn board on with connected HDMI and USB keyboard
  10. Observe blank black screen and non-working keyboard

Anything in the Supervisor logs that might be useful for us?

Can't retrieve logs, as OS does not boot.

Anything in the Host logs that might be useful for us?

Can't retrieve logs, as OS does not boot.

System Health information

No response

Additional information

Expected: Board boots, displays boot progress and HA prompt on screen, USB keyboard works.

Actual: HDMI screen powers on, but remains blank black, USB keyboard does not work.

After removing NVMe board boots, but USB keyboard does not work. This is expected, and requires adding dtoverlay=dwc2,dr_mode=host to config.txt

I've tried solutions from https://github.com/home-assistant/operating-system/issues/1887 and https://github.com/home-assistant/operating-system/issues/1911:

  • Replacing u-boot.bin with extracted version from https://github.com/home-assistant/operating-system/files/8702067/u-boot.img.tar.gz
  • Adding dtoverlay=dwc2,dr_mode=hostto config.txt - this helped keyboard to start working if NVMe is not connected
  • Adding earlycon to cmdline.txt - nothing changed
  • Adding device_tree=bcm2711-rpi-cm4.dtb to config.txt - nothing changed

Two issues mentioned above were closed, and people moved on to Raspberry PI OS + Supervisor, but I want to make it work on HASSOS. That's why I've created this new issue. I am happy to help troubleshoot the issue, use UART console if necessary (but you'll have to teach me how :wink ) and try out different params.

maxromanovsky avatar Aug 21 '22 16:08 maxromanovsky

Same setup (except that it is 32-bit) and issue here.

I get to the point where it says:

USB MSD timed out after 20 seconds
Restart 0 max -1
Failed to open device: 'sdcard' (cmd 371a0010 status 1fff0001)

Before update to 8.5 the setup worked fine without issues

bhessen avatar Aug 24 '22 16:08 bhessen

I'm not sure it's the same issue, as I can reproduce it on HAOS 8.4

maxromanovsky avatar Aug 24 '22 18:08 maxromanovsky

@bhessen also USB MSD sounds very much as if you connected your SSD via USB?

agners avatar Aug 24 '22 19:08 agners

That's true. After further investigation it seems to be a hardware problem with the SSD drive which coincidentally happened exactly at the same time as the update...?!? In any case, the drive seems to be severely broken now and I cannot initialize it anymore (in windows)

bhessen avatar Aug 25 '22 07:08 bhessen

@bhessen ok, thanks for confirming! I hope you have a somewhat recent backup :cold_sweat: In any case, your case is off topic in this issue as this issue is about a directly attached NVMe drive to a RPi CM4.

agners avatar Aug 25 '22 07:08 agners

From what I understand, this is caused by a early crash of the kernel: https://github.com/home-assistant/operating-system/issues/1887#issuecomment-1184316490

This is likely caused by some weird interaction between RPi Firmware/U-Boot/Raspberry Pi Linux kernel. The problem is a bit surprising to me, as I don't see it on Yellow (and it uses CM4 + NVMe as well).

I don't have that particular hardware, so I can debug this problem really...

agners avatar Aug 25 '22 07:08 agners

@agners I am eager to help :) I don't have any experience with OS debugging, but I'll do my best to help.

EDIT: I was wrong on this part below: ~~Also, just thinking out loud... Latest Raspberry PI OS contains the following files on /boot partition:~~

overlays/dwc2.dtbo
bcm2711-rpi-cm4.dtb

~~However, HAOS does not.~~

~~These files were mentioned in possible workarounds:~~

dtoverlay=dwc2,dr_mode=host
device_tree=bcm2711-rpi-cm4.dtb

~~Might that be a part of issue?~~

maxromanovsky avatar Aug 25 '22 12:08 maxromanovsky

@agners now I have some time to assist with this issue if you need some help.

I've managed to enable UART and extract logs from version 8.5 64bit, as well as both 32bit & 64bit of 9.0-rc1. Logs attached.

8.5_x64.txt 9.0rc1_32.txt 9.0rc1_64.txt

changes to files:

config.txt
enable_uart=1
device_tree=bcm2711-rpi-cm4.dtb

cmdline.txt
earlycon=uart8250,mmio32,0xfe215040

I tried just earlycon, but with that config nothing appeared in serial console.

I saw your comment about rpiboot: https://github.com/raspberrypi/usbboot/issues/36#issuecomment-1193639350 At first I was optimistic, and simple MacBook reboot helped me with that issue, but after some time this workaround stopped working. But it works great if CM4 is connected to Raspberry PI 4 (non-CM). This slows things down, as I have to switch from my main workstation to less comfortable Raspbian, but now I can easily flash new images.

So, if you have any ideas that I can try, please let me know. Or if you can suggest how can I bisect and troubleshoot this issue (maybe by comparing this repo with the specific commit from https://github.com/raspberrypi/linux repo? by building custom images locally? maybe some other approach?). I am total newbie in Linux kernel development, and haven't compiled Linux from scratch, haven't built the distro around the kernel... But if you have any tips, please let me know. I can invest some time into solving this issue, as I really want to use CM4 + M.2 NVMe to run HAOS natively (as opposed to supervised, containerized and other modes).

maxromanovsky avatar Sep 06 '22 20:09 maxromanovsky

I would also be keen to be able to use an M2 NVME with my CM4 and CM4-IO-BASE, especially given the delays on the PoE version of the HA Yellow

p6sfrx725z28knfy avatar Sep 10 '22 14:09 p6sfrx725z28knfy

I found a workaround that worked for me, so the issue is not that important for me anymore. I just flashed HAOS on NVMe and changed the boot priority. Now eMMC is not used anymore, and HAOS boots and works fine. However, the issue is not resolved, hence I keep it open.

maxromanovsky avatar Sep 13 '22 17:09 maxromanovsky

Hm, so it seems if U-Boot initializes the NVMe, then the kernel subsequently is able to boot then? This is a bit curious I must say.

Maybe we just need to force NVMe initialization in U-Boot no matter what.

agners avatar Sep 14 '22 13:09 agners

Btw, 9.0.rc2 and later have an updated Raspberry Pi Kernel and firmware, so it might be worth a try as well.

agners avatar Sep 14 '22 13:09 agners

@agners I've verified on 9.0.rc2, but had the same issue as before. And by that time I changed boot priority and flashed 8.5-64 on NVMe. If you need logs from UART on NVMe - I'll be happy to help, however now that I've migrated HAOS to NVMe I won't be able to play with eMMC, as I need a stable smart home controller. I plant to purchase another Waveshare base board, then I could be more helpful again.

maxromanovsky avatar Sep 14 '22 18:09 maxromanovsky

This is essentially a duplicate of #1887. Since #1887 has more information, let's continue discussion there.

agners avatar Oct 18 '22 18:10 agners