talos icon indicating copy to clipboard operation
talos copied to clipboard

Rasperry Pi CM4 - Allow boot from USB or SSD

Open timvandruenen opened this issue 2 years ago • 6 comments

Feature Request

Bootstrap Talos OS on the Raspberry Pi CM4 based devices from OnLogic, the OnLogic Factor 201 and 202.

Description

We are planning to run Talos cluster on multiple locations, mainly as "IOT", on a Raspberry Pi CM4 based device: OnLogic Factor 201. This devices only allows you to install an os from USB3 and afterwards dd it to the SSD they've attached to their motherboard. If you've chosen the SSD (M.2) instead of eMMC version.

I was able to install Raspberry Pi OS out of the box and Ubuntu 22.04 Raspberry Pi edition with minor boot/config tweaks from their side: https://support.onlogic.com/documentation/factor/#installing-ubuntu-server-to-m-2-storage

If I insert an USB stick in the USB3 port and start the device, Talos starts as expected. When I try to bootstrap the node: talosctl apply-config --insecure --mode=interactive --nodes <node IP> I can't select any disk. The dropdown in the interactive installer remains empty. Runnin: talosctl -e <node IP> -n <node IP> disks --insecure Also returns empty, with a return code 0.

I also tried: installing Raspberry Pi OS on the OnLogic device -> dd it to the SSD -> start RPi OS from there, insert an USB with Talos for RPi4 -> dd Talos OS to the internal SSD -> boot This results in the same as described above, Talos starts as expected. But during bootstrapping it doesn't seem to find any disks.

The log during my bootstrap action when I try it via interactive or a config with hardcoded /dev/sda in there as disk, is always: rpc error: code = Unknown desc = configuration validation failed: 2 errors occurred: * an install disk is required in "metal" mode * specified install disk does not exist: "" OR error applying new configuration: rpc error: code = Unknown desc = configuration validation failed: 1 error occurred: * specified install disk does not exist: "/dev/sda"

Could we please have a look at this? If I can do something regarding logs or testing, please let me know!

timvandruenen avatar Jul 15 '22 09:07 timvandruenen

talosctl -e <node IP> -n <node IP> disks --insecure seems that the kernel is not able to find any disks, i assume the kernel is missing some drivers. I'm not sure how SSD's are exposed in the OnLogic devices, I assume some sort of pcie controller.

Some quick ideas to get this going forward:

  • Does the disks shows up when booted into RpiOS?
  • If the above is true, then it would be nice if you could provide the output of lsusb and lspci -vvvv

frezbo avatar Jul 15 '22 13:07 frezbo

Hi @frezbo, thanks for your quick reply. Yes, with Raspberry Pi OS and Ubuntu it showed up. Under /dev/sdb when I first booted the OS from USB (the USB stick was /dev/sda), once dded the OS to the SSD and booted from the SSD it showed up as /dev/sda.

I'm at the office in a few minutes. Need to install Raspberry Pi OS again before I can run the commands. So hopefully I can answer your question soon!

timvandruenen avatar Jul 15 '22 13:07 timvandruenen

lspci_booted_from_ssd.txt lspci_booted_from_usb.txt lsusb_boote_from_ssd.txt lsusb_booted_from_usb.txt

Here the output of your commands + a photo from my screen during a boot sequence of Talos, telling me it found 2 storage devices when a USB is plugged in. When I remove the USB and boot it says 1 storage device found.

talos_boot

timvandruenen avatar Jul 15 '22 14:07 timvandruenen

Here the output of your commands + a photo from my screen during a boot sequence of Talos, telling me it found 2 storage devices when a USB is plugged in. When I remove the USB and boot it says 1 storage device found.

This is u-boot, talos has not booted yet in this case, anyways seems like missing kernel drivers probably. I'll take a look at the provided outputs.

frezbo avatar Jul 15 '22 14:07 frezbo

Hey @timvandruenen would you mind joining our slack: https://slack.dev.talos-systems.io/, would be easier to communicate and gather more info from hardware

frezbo avatar Jul 18 '22 14:07 frezbo

Done!

tvdruenen avatar Jul 18 '22 20:07 tvdruenen

I can see the boot options in Talos 1.3 (not in 1.2) - however it does not do anything; see this Slack thread for more details: https://taloscommunity.slack.com/archives/CK8H5DDDM/p1670944730167479

xvzf avatar Dec 14 '22 16:12 xvzf

While https://github.com/siderolabs/pkgs/pull/648 fixes the u-boot issue, it's now failing to discover the NVME SSD 🙄

xvzf avatar Jan 10 '23 09:01 xvzf

If you're testing with latest main, it has broken RPi config in siderolabs/pkgs#642. It might be easier to test with release-1.3 branch for now.

smira avatar Jan 10 '23 17:01 smira

I am seeing this issue again when trying to bootstrap a new controlplane on raspberry pies and talos version v1.3.5. I don't have a screen ready to check the console but can provide this on monday.


error applying new configuration: rpc error: code = InvalidArgument desc = configuration validation failed: 1 error occurred:
        * specified install disk does not exist: "/dev/sdb"

jakoberpf avatar Feb 26 '23 03:02 jakoberpf

Run `talosctl disks --insecure --endpoint --nodes to get the list of disks detected. /dev/*d* are not consistent across distros and could be even inconsistent across reboots in some edge cases as per how linux reports it

frezbo avatar Feb 26 '23 03:02 frezbo

That solved it thanks. Tried this command before unsuccessfully, probably to another error on my side. Note that I needed to omit the --endpoint <ip> from the command as it would produce unknown flag: --endpoint. talosctl disks --insecure --nodes <ip> worked fine.

jakoberpf avatar Feb 26 '23 03:02 jakoberpf

sorry, that was --endpoints , glad it's sorted

frezbo avatar Feb 26 '23 03:02 frezbo