talos
talos copied to clipboard
RockPi 4c boot loop after talos upgrade
Bug Report
Description
I believe this has been happening since 1.0 or 1.1. Every OS upgrade results in a boot loop, and I need to dd
the sdcard to bring the node back.
The upgrade itself completes without error and installs grub to /dev/mmcblk1
After the reboot the node never comes back online. When I connect an HDMI cable, I see the following error:
* specified install disk does not exist: "/dev/mmcblk1"
Logs
This is the upgrade log
blackrock: kern: notice: [2022-07-28T12:33:27.956345828Z]: XFS (mmcblk1p3): Unmounting Filesystem
blackrock: user: warning: [2022-07-28T12:33:27.969568828Z]: 2022/07/28 12:33:54 preserved contents of "BOOT": 62079139 bytes
blackrock: user: warning: [2022-07-28T12:33:28.020093828Z]: 2022/07/28 12:33:54 preserved contents of "META": 1055 bytes
blackrock: user: warning: [2022-07-28T12:33:32.902606828Z]: 2022/07/28 12:33:59 preserved contents of "STATE": 135072 bytes
blackrock: user: warning: [2022-07-28T12:33:32.903265828Z]: 2022/07/28 12:33:59 resetting partition table on /dev/mmcblk1
blackrock: user: warning: [2022-07-28T12:33:32.919100828Z]: 2022/07/28 12:33:59 partitioning /dev/mmcblk1 - EFI "105 MB"
blackrock: user: warning: [2022-07-28T12:33:32.919729828Z]: 2022/07/28 12:33:59 created /dev/mmcblk1p1 (EFI) size 204800 blocks
blackrock: user: warning: [2022-07-28T12:33:32.920389828Z]: 2022/07/28 12:33:59 partitioning /dev/mmcblk1 - BIOS "1.0 MB"
blackrock: user: warning: [2022-07-28T12:33:32.921201828Z]: 2022/07/28 12:33:59 created /dev/mmcblk1p2 (BIOS) size 2048 blocks
blackrock: user: warning: [2022-07-28T12:33:32.921960828Z]: 2022/07/28 12:33:59 partitioning /dev/mmcblk1 - BOOT "1.0 GB"
blackrock: user: warning: [2022-07-28T12:33:32.922663828Z]: 2022/07/28 12:33:59 created /dev/mmcblk1p3 (BOOT) size 2048000 blocks
blackrock: user: warning: [2022-07-28T12:33:32.923431828Z]: 2022/07/28 12:33:59 partitioning /dev/mmcblk1 - META "1.0 MB"
blackrock: user: warning: [2022-07-28T12:33:32.924135828Z]: 2022/07/28 12:33:59 created /dev/mmcblk1p4 (META) size 2048 blocks
blackrock: user: warning: [2022-07-28T12:33:32.924905828Z]: 2022/07/28 12:33:59 partitioning /dev/mmcblk1 - STATE "105 MB"
blackrock: user: warning: [2022-07-28T12:33:32.925999828Z]: 2022/07/28 12:33:59 created /dev/mmcblk1p5 (STATE) size 204800 blocks
blackrock: user: warning: [2022-07-28T12:33:32.926714828Z]: 2022/07/28 12:33:59 partitioning /dev/mmcblk1 - EPHEMERAL "0 B"
blackrock: user: warning: [2022-07-28T12:33:32.927376828Z]: 2022/07/28 12:33:59 created /dev/mmcblk1p6 (EPHEMERAL) size 122249216 blocks
blackrock: user: warning: [2022-07-28T12:33:32.939396828Z]: 2022/07/28 12:33:59 formatting the partition "/dev/mmcblk1p1" as "vfat" with label "EFI"
blackrock: user: warning: [2022-07-28T12:33:33.328981828Z]: 2022/07/28 12:33:59 zeroing out "/dev/mmcblk1p2"
blackrock: user: warning: [2022-07-28T12:33:33.388437828Z]: 2022/07/28 12:33:59 formatting the partition "/dev/mmcblk1p3" as "xfs" with label "BOOT"
blackrock: user: warning: [2022-07-28T12:33:36.718076828Z]: 2022/07/28 12:34:02 zeroing out "/dev/mmcblk1p4"
blackrock: user: warning: [2022-07-28T12:33:36.775750828Z]: 2022/07/28 12:34:03 zeroing out "/dev/mmcblk1p5"
blackrock: user: warning: [2022-07-28T12:33:41.891533828Z]: 2022/07/28 12:34:08 zeroing out "/dev/mmcblk1p6"
blackrock: kern: notice: [2022-07-28T12:33:41.949740828Z]: XFS (mmcblk1p3): Mounting V5 Filesystem
blackrock: kern: info: [2022-07-28T12:33:42.069950828Z]: XFS (mmcblk1p3): Ending clean mount
blackrock: kern: notice: [2022-07-28T12:33:48.236939828Z]: XFS (mmcblk1p3): Unmounting Filesystem
blackrock: user: warning: [2022-07-28T12:33:48.291965828Z]: 2022/07/28 12:34:14 restored contents of "BOOT"
blackrock: user: warning: [2022-07-28T12:33:48.358252828Z]: 2022/07/28 12:34:14 restored contents of "META"
blackrock: user: warning: [2022-07-28T12:33:53.509970828Z]: 2022/07/28 12:34:19 restored contents of "STATE"
blackrock: kern: notice: [2022-07-28T12:33:53.528782828Z]: XFS (mmcblk1p3): Mounting V5 Filesystem
blackrock: kern: info: [2022-07-28T12:33:53.682811828Z]: XFS (mmcblk1p3): Ending clean mount
blackrock: user: warning: [2022-07-28T12:33:53.704839828Z]: 2022/07/28 12:34:19 copying /usr/install/arm64/vmlinuz to /boot/B/vmlinuz
blackrock: user: warning: [2022-07-28T12:33:53.868117828Z]: 2022/07/28 12:34:20 copying /usr/install/arm64/initramfs.xz to /boot/B/initramfs.xz
blackrock: user: warning: [2022-07-28T12:33:53.957148828Z]: 2022/07/28 12:34:20 writing /boot/grub/grub.cfg to disk
blackrock: user: warning: [2022-07-28T12:33:53.960706828Z]: 2022/07/28 12:34:20 executing: grub-install --boot-directory=/boot --efi-directory=/boot/EFI --removable --target=arm64-efi /dev/mmcblk1
blackrock: user: warning: [2022-07-28T12:33:53.962825828Z]: Installing for arm64-efi platform.
blackrock: user: warning: [2022-07-28T12:34:00.944116828Z]: Installation finished. No error reported.
blackrock: user: warning: [2022-07-28T12:34:00.996943828Z]: 2022/07/28 12:34:27 installing U-Boot for "rockpi_4"
blackrock: user: warning: [2022-07-28T12:34:01.010227828Z]: 2022/07/28 12:34:27 writing /usr/install/arm64/u-boot/rockpi_4/u-boot-rockchip.bin at offset 32768
blackrock: user: warning: [2022-07-28T12:34:01.027306828Z]: 2022/07/28 12:34:27 wrote 9368664 bytes
blackrock: kern: notice: [2022-07-28T12:34:01.717255828Z]: XFS (mmcblk1p3): Unmounting Filesystem
blackrock: user: warning: [2022-07-28T12:34:01.787748828Z]: 2022/07/28 12:34:28 installation of v1.1.2 complete
blackrock: rpc error: code = Unavailable desc = error reading from server: EOF
Monitor photo only as the RockPI is not booting
Environment
- Talos version: 1.1.1 > 1.1.2
- Platform: RockPI 4c, arm64
I don't have an exact answer, but looks like after upgrade
mmcblk1
disappears.
I'm not familiar with the board, but to make Talos happy you can update machine configuration before an upgrade with machine: install: disk: /dev/mmcblk0
. In fact Talos only ever uses that on actual install, so the value doesn't matter after an initial install.
I'll update the machine config and follow up
Interesting observation. I dd
'd a fresh 1.1.2 install to the sdcard, and the sdcard shows up as /dev/mmcblk1
The apply-config
command fails if I try to set the install disk to /dev/mmcblk0
$ talosctl disks --insecure --nodes blackrock
DEV MODEL SERIAL TYPE UUID WWID MODALIAS NAME SIZE BUS_PATH
/dev/mmcblk1 - 0x4e01293a SD - - - SN64G 64 GB /platform/fe320000.mmc/mmc_host/mmc1/mmc1:aaaa/
/dev/sda Samsung SSD 870 - SSD - - scsi:t-0x00 - 2.0 TB /platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/ata1/host0/target0:0:0/0:0:0:0/
interesting... it changes with a reboot? in fact that we validate the install disk in that phase is a bug, as it's irrelevant - Talos finds its block device by partition labels
Yeah, just reviewing the logs, the sdcard seems to change from /dev/mmcblk1
to /dev/mmcblk0
but only after upgrades. If I kill the power to the node it will restart properly. The changing disk only seems to happen during upgrades
I have a rock pi4c and never seen any issues with upgrades, but in my case it a nvme, so it always shows up as /dev/nvme0
. Is the /dev/mmcblk
the on-board spi flash?
I've found an old 16G eMMC card lying around, and successfully installed Talos and booted off eMMC. The eMMC shows up as /dev/mmcblk0
. I've updated the machine config and the RockPI has rejoined the cluster.
I'll wait for the next upgrade and see if it works better
$ talosctl disks --nodes blackrock dmesg
NODE DEV MODEL SERIAL TYPE UUID WWID MODALIAS NAME SIZE BUS_PATH
blackrock /dev/mmcblk0 - 0xd2e703a1 SD - - - 58A43A 16 GB /platform/fe330000.mmc/mmc_host/mmc0/mmc0:0001/
blackrock /dev/mmcblk0boot0 - - SD - - - - 4.2 MB /platform/fe330000.mmc/mmc_host/mmc0/mmc0:0001/block/mmcblk0/mmcblk0boot0
blackrock /dev/mmcblk0boot1 - - SD - - - - 4.2 MB /platform/fe330000.mmc/mmc_host/mmc0/mmc0:0001/block/mmcblk0/mmcblk0boot1
blackrock /dev/sda Samsung SSD 870 - SSD - - scsi:t-0x00 - 2.0 TB /platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/ata4/host3/target3:0:0/3:0:0:0/
So I'm back with this same issue after upgrading from 1.1.2 to 1.2.5. It's the same error, except now /dev/mmcblk0
is missing, and /dev/mmcblk1
has appeared.
This might be a dumb question but I noticed in Grub I have 2 choices, A, B. For this upgrade, 1.2.5 appears on B. Is it possible that when grub boots choice B, the mmc card changes mount points?
this must be the kernel, since there's kernel updates between 1.1.2 and 1.2.5, i think you should do talosctl disks
and specify the install disk by diskselector, so it will always find the right one
Is the disk selector available to me for version upgrades? The disk whether it's mmcblk1
or mmcblk0
is always correct for the initial install (I've re-flashed the MMC many times due to this issue). This issue only occurs when I run talosctl upgrade
.
Is the disk selector available to me for version upgrades? The disk whether it's
mmcblk1
ormmcblk0
is always correct for the initial install (I've re-flashed the MMC many times due to this issue). This issue only occurs when I runtalosctl upgrade
.
I believe you'd have to start with a fresh install.
https://www.talos.dev/v1.2/reference/configuration/#installdiskselector you should use the busPath
Oh busPath
looks promising! I will test and follow up.
Ok, so reflashed and reinstalled again, this time using diskSelector
. Does the following seem correct?
install:
diskSelector:
busPath: /platform/fe330000.mmc/mmc_host/*
$ talosctl disks --nodes blackrock
NODE DEV MODEL SERIAL TYPE UUID WWID MODALIAS NAME SIZE BUS_PATH
blackrock /dev/mmcblk1 - 0xd2e703a1 SD - - - 58A43A 16 GB /platform/fe330000.mmc/mmc_host/mmc1/mmc1:0001/
blackrock /dev/mmcblk1boot0 - - SD - - - - 4.2 MB /platform/fe330000.mmc/mmc_host/mmc1/mmc1:0001/block/mmcblk1/mmcblk1boot0
blackrock /dev/mmcblk1boot1 - - SD - - - - 4.2 MB /platform/fe330000.mmc/mmc_host/mmc1/mmc1:0001/block/mmcblk1/mmcblk1boot1
blackrock /dev/sda Samsung SSD 870 - SSD - - scsi:t-0x00 - 2.0 TB /platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/ata4/host3/target3:0:0/3:0:0:0/
I actually flashed Talos 1.2.4, by mistake, but it provided a good opportunity to test an upgrade. The upgrade worked, however the MMC card remained at /dev/mmcblk1
, so I couldn't really test if an upgrade would work if the MMC decides to appear as mmc0
Using the diskSelector config from https://github.com/siderolabs/talos/issues/5978#issuecomment-1278040032 resolved my failed to boot after upgrade issue.
Does this require a documentation update for RockPI's or am I the only one that had the issue? I can submit a PR for docs if anything thinks it's needed. Otherwise, thanks for the help!
I'll share my findings with a RPI4, where I also had an issue with /dev/mmcblk1
becoming /dev/mmcblk0
after the installation of Talos. Slack Thread
After the installation, Talos from PXE boot was trying to boot from disk, but throwing an error message:
specified install disk does not exist: /dev/mmcblk1
And guess what, during the initial install of uboot, the SD card is called /dev/mmcblk1
and it installs properly. But during the following uboot boot, the SD card gets recognized as /dev/mmcblk0
: (this is after the install)
talosctl -n 192.168.1.167 --talosconfig cluster-0-talosconfig disks
NODE DEV MODEL SERIAL TYPE UUID WWID MODALIAS NAME SIZE BUS_PATH
SUBSYSTEM SYSTEM_DISK
192.168.1.167 /dev/mmcblk0 - 0xb285ce23 SD - - - SN64G 64 GB /platform/emmc2bus/fe340000.mmc/mmc_host/mmc0/mmc0:aaaa/ /sys/class/block *
Talos 1.6 drops this erroneous config validation on boot, fyi.