talos icon indicating copy to clipboard operation
talos copied to clipboard

Stuck at efi stub: loaded initrd from LINUX_EFI_INITRD_MEDIA_GUID device path

Open muvaf opened this issue 1 year ago • 6 comments

Bug Report

Description

I have tried two methods to install Talos OS to a bare metal instance in Hetzner (not Hetzner Cloud):

  • Log into rescue mode with SSH and write content of "metal-amd64.raw.xz" to the boot disk /dev/nvme0n1 (tried writing to other disk as well /dev/nvme1n1
  • Have Hetzner technician write the metal-amd64.iso file to a USB and plug it in, then choose the USB from boot menu.

In both cases, when it boots, it's stuck at black screen with the following message: EFI stub: Loaded initrd from LINUX_EFI_INITRD_MEDIA_GUID device path

All talosctl commands time out.

Logs

EFI stub: Loaded initrd from LINUX_EFI_INITRD_MEDIA_GUID device path

One thing I noticed, when I go to rescue mode and open the disk I dd'ed Talos raw image, I see the following:

GPT PMBR size mismatch (181931 != 1000215215) will be corrected by write.
The backup GPT table is not on the end of the device. This problem will be corrected by write.
The device contains 'iso9660' signature and it will be removed by a write command. See fdisk(8) man page and --wipe option for more details.

Environment

  • Talos version: v1.7.0
  • Platform: Hetzner Robot dedicated root server with Intel i5 13500 (product page)

muvaf avatar Apr 25 '24 19:04 muvaf

I've been digging this log line and different distributions had their own entries but some of them were resolved by setting kernel cmdline parameter console. I'm not sure if the exact parameter can be the same for everyone but I've resolved my issue by going into rescue mode on my machine, query for ttys and their baud rate, and then generate an image with Image Factory with console=tty0,38400. Dumped more detailed steps here.

Other OSes I installed to my machine didn't struggle with figuring that out. Could it be the bootloader Talos uses is not configured to detect and set this parameter? netboot.xyz also seems to instruct manually setting it in some cases.

muvaf avatar Apr 26 '24 09:04 muvaf

You can try removing all console= args at the GRUB boot prompt by editing the command line.

smira avatar Apr 26 '24 09:04 smira

@smira I don't have access to GRUB boot prompt unless I get Hetzner to attach a KVM device to the server. In my second try, having just console=tty0 worked as well. I'd like to update this doc with this information but the doc is tailored for Hetzner Cloud which is completely separate from Hetzner Robot dedicated server service where there isn't even a snapshotting machinery. You just write the metal-amd64.iso to the whole disk.

muvaf avatar Apr 27 '24 04:04 muvaf

Same issue with agent-amd64 booting on a qemu/kvm VM over PXE with UEFI with sidero 0.6.4. Screenshot from 2024-05-06 11-02-08

Rollback the sidero-controller-manager to 0.6.3 fix my issue for the agent-amd64

aarnaud avatar May 06 '24 15:05 aarnaud

I got it on Serial Port !

image

aarnaud avatar May 06 '24 20:05 aarnaud

Fix for me switching pc-q35 to pc-i440fx on kvm/qemu, it's seem the kernel may miss some device support/drivers

aarnaud avatar May 06 '24 20:05 aarnaud