talos
talos copied to clipboard
talos.inline.config not working in Omni generated VMWare ova image
Bug Report
When I generate a Talos v1.8.1 image for VMWare platform using omnictl our CA Certificate config is ignored when using the talos.config.inline kernel args and we get x509 Certificate errors in the console log of the Talos VM when trying to connect to our Omni instance.
I provide --extra-kernel-args talos.config.inline=${TALOS_CONFIG_INLINE} to omnictl where $TALOS_CONFIG_INLINE is created by using the guide here https://www.talos.dev/v1.8/reference/kernel/#talosconfiginline. The config document is a CA certificate, see https://www.talos.dev/v1.8/talos-guides/configuration/certificate-authorities/#appending-the-certificate-authority. I have tried using the offical factory.talos.dev with the same result. I have checked the GRUB menu and the talos.config.inline key and value is present.
If I instead provide the same CA certificate config document as a base64 encoded string and instead use the VMware guestinfo the CA certificate works great and the node can connect to our Omni instance without any errors. I use this command to insert the config document to the VM host,
govc vm.change \
-e "guestinfo.talos.config=$(cat ca-root-config.yml | base64)"
....
I have tried to wipe and reset the machine and edit the kernel arguments to change the platform and remove the talos.config=guestinfo line without any luck. But am not sure it has anything to do with this.
Platform: VMWare (OVA template) Talos Version: v1.8.1
Please provide kernel logs.
P.S. It's way better to use userdata than talos.config.inline with Omni.
The kernel log: the best I can do is an image, hope that works
The rest of the logs are mostly from time.syncController that can't connect out to internet.
The status of the node stays like this forever:
P.S For the userdata part, that actually sounds very reasonable, since it don't have quite the same limitations. Thank you.
We need full kernel logs, (serial console logs) to understand why the config failed to load. We can't debug much without it, sorry.
We need full kernel logs, (serial console logs) to understand why the config failed to load. We can't debug much without it, sorry.
I'll check if I can attach a serial and save to disk
Here it is! I have redacted the sensitive information. console-log.txt
On line 15 and 104 the talos.config.inline is clearly missing. I can see it in the GRUB menu though.
If you're booting from the OVA, it should be there, unless there was something else happening (like an upgrade) which would wipe that kernel argument?
It's very strange, I have to do some more digging. But no upgrade or any adjustments are made, they are clearly visible in the grub edit menu. Steps are,
- Generate with
omnictl - Upload to our content directory with
govc - Deploy it. (I make no adjustments or modifications in this step, simply
New VM from template) - Start
These console logs are of a completely fresh machine I created.
Here is the full omnictl command with expanded variables:
omnictl download vmware \
--talos-version v1.8.1 \
--arch amd64 \
--extensions vmtoolsd-guest-agent \
--initial-labels environment=<env> --initial-labels region=<REGION> \
--extra-kernel-args talos.config.inline=$(cat sbab-root-ca.yml | zstd --compress --ultra -22 | base64 -w 0) \
--output _out/v1.8.1-<REGION>-common
GRUB image:
I wonder if it's too big and gets cut by GRUB... maybe your certificate is RSA? ECDSA is way smaller
Well in totalt with our talos.config.inline the whole command is 2700 bytes.
BOOT_IMAGE=/A/vmlinuz talos.platform=vmware talos.config=guestinfo console=tty0 console=ttyS0 earlyprintk=ttyS0,115200 net.ifnames=0 init_on_alloc=1 slab_nomerge pti=on consoleblank=0 nvme_core.io_timeout=4294967295 printk.devkmsg=on ima_template=ima-ng ima_appraise=fix ima_hash=sha512 siderolink.api=https://<REDACTED>:443?grpc_tunnel=false&jointoken=<REDACTED> talos.events.sink=[fdae:41e4:649b:9303::1]:8091 talos.logging.kernel=tcp://[fdae:41e4:649b:9303::1]:8092 talos.config.line=<2213 bytes>
This is without the redacted stuff.
❯ wc -c talos-kernel-args.txt
2700 talos-kernel-args.txt
In your documentation it says the Linux kernel args has a max size of 4096, but maybe grub has another limit?
yes, it might be GRUB or the boot protocol used with GRUB limit (I guess you're booting in BIOS mode on VMWare?)
yes, it might be GRUB or the boot protocol used with GRUB limit (I guess you're booting in BIOS mode on VMWare?)
Yes, BIOS.
This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been stalled for 7 days with no activity.