sidero
sidero copied to clipboard
Workload cluster of Dell R230s with idrac8 stuck at boot prompt after pxe booting
Hi,
Whenever I reboot a node which is part of a sidero workload cluster, it pxe boots and gets stuck at the boot prompt. I have to manually log into the iDRAC, connect to the console, and force it to boot from disk for the node to come back up.
It's possible I've configured something incorrectly, but
- pxe booting works
- I pxe booted these nodes to create/join a workload cluster
- ipmi seems to work because after accepting the nodes and creating the cluster in the management cluster, they powered up and installed talos
- and talosctl shutdown seems to do the right thing on these hosts.
I suspect this is a config error in whatever toggles the boot order via ipmi in the mangement cluster on the workload cluster.
Happy to provide logs, just let me know which are interesting.
Thanks!
BTW, Continuing will simply get us back into the pxe boot. I have to Continue, wait for the broadcom pxe firmware loading message, cancel that by pressing escape, and then the boot order continues and boots off of the primary disk into grub.
It's not clear what the problem is, but we recommend using snp.efi
instead of ipxe.efi
: https://www.sidero.dev/v0.6/getting-started/prereq-dhcp/
Hi I also have some problem but maybe we can resolve both problems here. I did get that problem before so if you update to the snp.efi you should get longer.
My dnsmasq config
hostNetwork: true
containers:
- name: dnsmasq
args:
- -d
- --port=5353
- --dhcp-range=10.202.53.20,10.202.53.100
- --dhcp-option=option:router,10.202.53.1
- --dhcp-option=6,1.1.1.1
- --dhcp-boot=tag:ipxe,ipxe.efi,10.202.53.11
- --addn-hosts=/dnsmasq/hosts.text
- --dhcp-hostsfile=/dnsmasq/dhcphosts.txt
- --log-queries
- --log-dhcp
Im running the DHCP proxy and it setup the boot.
023/11/14 13:53:55 HTTP GET /boot.ipxe 10.202.53.11:7788
2023-11-14T13:54:04Z INFO dhcp-proxy offering boot response {"source": "04:32:01:47:34:e0", "server": "10.202.53.11", "boot_filename": "snp.efi"}
2023-11-14T13:54:04Z INFO dhcp-proxy ignoring packet {"source": "04:32:01:47:34:e0", "reason": "packet is REQUEST, not DISCOVER"}
2023/11/14 13:54:05 HTTP GET /boot.ipxe 10.202.53.11:7788
2023/11/14 13:54:05 HTTP GET /boot.ipxe 10.202.53.11:47833
2023/11/14 13:54:08 HTTP GET /ipxe?uuid=4c4c4544-0038-4810-804e-c4c04f353034&mac=04-32-01-47-34-e0&domain=&hostname=node8&serial=D8HN504&arch=x86_64 10.202.53.58:48382
2023/11/14 13:54:08 Using "agent-amd64" environment
2023/11/14 13:54:08 HTTP GET /env/agent-amd64/vmlinuz 10.202.53.58:48382
2023/11/14 13:54:09 HTTP GET /env/agent-amd64/initramfs.xz 10.202.53.58:48382
2023/11/14 13:54:15 HTTP GET /boot.ipxe 10.202.53.11:47833
From the logs we can see that the proxy switches the bootfile to the new snp.efi
Still my boot get stuck in a
efi stub mesuredata into pcr 9
Im using a 10G card next time in the datacenter I will try the 1G card.
What in the world can "efi stub mesuredata into pcr 9" be ?
(Om booting from my VM and they use undionly.kpxe and are working good)