1.12 machineconfig patch issues
Bug Report
Description
I'm unable to apply machineconfig patches to any of my v1.12 nodes. I'm running off of a build derived from all the tooling in v1.12.0-alpha.2-6-g64a46a7. I realize it's an alpha build but machine config patches as well as trying to apply them isn't working.
Logs
talosctl -n c0r1-gpu1 patch machineconfig --patch @talos-kubelet-patch.yaml
recovered: expected a mapping node
The patch is simple:
machine:
kubelet:
extraArgs:
max-pods: "60"
Also fails. The patch worked just fine worked against my v1.11.3 nodes that are provisioned with more or less an identical machineconfig.
I have even tried to download the existing machineconfig using -o yaml and then edit the file and apply it and get the same error.
Looking at the logs from apid there isn't anything more helpful that I can see:
c0r1-gpu1: 2025/11/12 19:39:22.946083 log.go:94: InvalidArgument [/machine.MachineService/ApplyConfiguration] 47.284866ms stream rpc error: code = InvalidArgument desc = recovered: expected a mapping node (:authority=c0r1-gpu1:50000;content-type=application/grpc+proxy>proto;grpc-accept-encoding=gzip,gzip;proxyfrom=10.95.186.6;runtime=Talos;talos-role=os:admin;user-agent=grpc-go/1.68.1)
being the only relevant log.
I have also tested using the alpha talosctl and get the same error. Happy to provide any additional logs just let me know.
❯ talosctl -n c1t1-gpu2 patch machineconfig --patch @talos-kubelet-patch.yaml
WARNING: c1t1-gpu2: server version 1.11.3-0 is older than client version 1.11.5
patched MachineConfigs.config.talos.dev/v1alpha1 at the node c1t1-gpu2
WARNING: extra kernel arguments are not supported when booting using SDBoot
Applied configuration without a reboot
Environment
Bare metal nodes
- Talos version:
❯ talosctl version --nodes c0r1-gpu1
Client:
Tag: v1.11.5
SHA: undefined
Built: 2025-11-06T12:35:51Z
Go version: go1.25.4
OS/Arch: darwin/arm64
Server:
NODE: c0r1-gpu1
Tag: v1.12.0-alpha.2-16-gc93a9c6b4
SHA: c93a9c6b
Built:
Go version: go1.25.4
OS/Arch: linux/amd64
Enabled: RBAC
- Kubernetes version:
Client Version: v1.34.1
Kustomize Version: v5.7.1
Server Version: v1.32.0
- Platform:
I can't reproduce this issue with main at least, it looks like you have some broken build (?). Your exact patch applies, and all the functionality around config patching is fully tested.
This is the command I used to build it:
docker run --rm -t -v $PWD/\_out:/secureboot:ro -v $PWD/\_out:/out -v /dev:/dev --privileged ghcr.io/siderolabs/imager:v1.12.0-alpha.2-16-gc93a9c6b4 \
installer \
--system-extension-image ghcr.io/siderolabs/amd-ucode:20251021@sha256:e64dbc49897ddfdb7ab694b446a25488cdfb7d145f026d63f57026e71593a67c \
--system-extension-image ghcr.io/siderolabs/iscsi-tools:v0.2.0@sha256:e49cc872c25853ed27bce6b5c3ffee281d205c09f03db20f236409581c5f8cb9 \
--system-extension-image ghcr.io/siderolabs/lldpd:1.0.20@sha256:a7c63d6d0e4f6e0452d6b44b9fd989df73e6b240bd2aba37ee309f83cd80fcde \
--system-extension-image ghcr.io/siderolabs/nvidia-container-toolkit-production:570.195.03-v1.18.0@sha256:9e5e63220f9712f6618b52efcd1c88ce7345cb04ca9f6adb0679bd530fd8587a \
--system-extension-image ghcr.io/siderolabs/nvidia-fabricmanager-production:570.195.03@sha256:0a4fc05b9bf1a350b006fcc4534097666bbd4f0842048f8fd87cc1e3b2bb9c2f \
--system-extension-image ghcr.io/nicolerenee/talos/nvidia-gdrdrv-mount:v2.5.1@sha256:18953a855df9d8108b21fe74bef6eec9a6d8077dcc59c2aeef617e5b88d4eebd \
--system-extension-image ghcr.io/siderolabs/nvidia-open-gpu-kernel-modules-production:570.195.03-v1.12.0-alpha.2-6-g64a46a7@sha256:b64d6fc844164037017b05e6428fde80f53510966c701c42106f46cd8106b030 \
--base-installer-image ghcr.io/siderolabs/installer-base:v1.12.0-alpha.2-16-gc93a9c6b4 \
--extra-kernel-arg "iommu=pt"
My nvidia-gdrdrv-mount is the exact same as what got merged into extensions as nvidia-gdrdrv-device. Now that it's merged I can build this again with only extensions y'all auto published, but wanted to double check you weren't seeing anything in this command that is wrong that could be causing the problem.
My other thought is that the machineconfig we are applying during install works but some how gets it into a broken state.
I don't have any exact idea here, sorry. If it doesn't work with Talos release, we're happy to look into.
For me, disabling the UEFI configuration on the host solved the problem! No more warning messages.
Unfortunately disabling UEFI isn't an option for me.
I have upgraded my nodes to the v1.12.0-beta.0 release with an image built by factory and I'm still getting the same error.
❯ talosctl -n c0r3-gpu1 patch machineconfig --patch @talos-kubelet-patch.yaml
error constructing client: failed to determine endpoints
❯ talosctl version -n c0r3-gpu1
Client:
Tag: v1.11.5
SHA: undefined
Built: 2025-11-06T12:35:51Z
Go version: go1.25.4
OS/Arch: darwin/arm64
Server:
NODE: c0r3-gpu1
Tag: v1.12.0-beta.0
SHA: 3d997d74
Built:
Go version: go1.25.4
OS/Arch: linux/amd64
Enabled: RBAC
❯ k describe no c0r3-gpu1 | grep schematic:
extensions.talos.dev/schematic: c1272823a3d5aecf17257649487664aa48397f8cae15b573b0a14165e2c790cf
Unfortunately disabling UEFI isn't an option for me.
I have upgraded my nodes to the v1.12.0-beta.0 release with an image built by factory and I'm still getting the same error.
❯ talosctl -n c0r3-gpu1 patch machineconfig --patch @talos-kubelet-patch.yaml error constructing client: failed to determine endpoints ❯ talosctl version -n c0r3-gpu1 Client: Tag: v1.11.5 SHA: undefined Built: 2025-11-06T12:35:51Z Go version: go1.25.4 OS/Arch: darwin/arm64 Server: NODE: c0r3-gpu1 Tag: v1.12.0-beta.0 SHA: 3d997d74 Built: Go version: go1.25.4 OS/Arch: linux/amd64 Enabled: RBAC ❯ k describe no c0r3-gpu1 | grep schematic: extensions.talos.dev/schematic: c1272823a3d5aecf17257649487664aa48397f8cae15b573b0a14165e2c790cf
this does have nothing to do with UEFI
> ❯ talosctl -n c0r3-gpu1 patch machineconfig --patch @talos-kubelet-patch.yaml
> error constructing client: failed to determine endpoints
this means endpoints has not been set in TALOSCONFIG, use --endpoint and --nodes to target a specific node (when using a worker node --endpoint should be a controlplane`
this does have nothing to do with UEFI
> ❯ talosctl -n c0r3-gpu1 patch machineconfig --patch @talos-kubelet-patch.yaml > error constructing client: failed to determine endpointsthis means
endpointshas not been set in TALOSCONFIG, use--endpoint and --nodesto target a specific node (when using a worker node--endpointshould be a controlplane`
You are correct, sorry I saw the error and just copy pasta without reading close enough. Pointed to the currect talosconfig for this cluster and getting the error still.
❯ talosctl -n c0r3-gpu1 patch machineconfig --patch @talos-kubelet-patch.yaml
recovered: expected a mapping node
I think an easy reproducer would be talosctl gen secrets generate a machineconfig with --with-secrets and apply the patch and so can try to reproduce, also the talosctl version and the server version and the patch itself
Applying a machine config with patch works for me with 1.12-beta.0
./talosctl apply -f controlplane.yaml -n 10.1.1.14 -i -p '@nvidia.yaml'
And patching the machine config worked
./talosctl patch mc --patch '@cp-schedule.yaml' -n 10.1.1.14
The contents of cp-schedule.yaml is
cluster:
allowSchedulingOnControlPlanes: true
but there were some patches that failed that traditionally succeed.
using the old hostname config doesn't work
machine:
network:
hostname: spark
and applying it
./talosctl patch machineconfig -p '@hostname2.yaml' -n 10.1.1.14
patched MachineConfigs.config.talos.dev/v1alpha1 at the node 10.1.1.14
1 error occurred:
* 10.1.1.14: rpc error: code = InvalidArgument desc = 1 error occurred:
* static hostname is already set in v1alpha1 config
If I try to apply the new multi-doc config
apiVersion: v1alpha1
kind: HostnameConfig
hostname: spark
I get the following error
./talosctl apply -f hostname.yaml -n 10.1.1.14
error applying new configuration: 1 error occurred:
* 10.1.1.14: rpc error: code = InvalidArgument desc = the applied machine configuration doesn't contain v1alpha1 config, did you mean to patch the machine config instead?
@nicolerenee I get the same error as you under a couple conditions
If I forgot @ on my patch file
./talosctl patch machineconfig -p hostname.yaml -n 10.1.1.14
recovered: expected a mapping node
I tried the same patch for the kubelet in your example and it worked on my system.