talm icon indicating copy to clipboard operation
talm copied to clipboard

Duplicate array entries inserted into machine config when running talm apply

Open hochbit opened this issue 3 months ago • 4 comments

Talos Version: TalosOS 1.11.1 Talm Version: 0.16.1

Running ´´´talm apply´´´ results in duplicated array entries in the machine config, preventing new Talos nodes from becoming ready.

The effected arrays are:

  • cluster.apiServer.certSANs
  • cluster.apiServer.admissionControl.[name=PodSecurity].exemptions.namespaces

Command:

talm apply -f nodes/node-cp-2.yaml --dry-run

Result:

@@ -63,6 +72,10 @@
         image: registry.k8s.io/kube-apiserver:v1.34.0
         certSANs:
              - 10.1.1.222
+           - 10.1.1.222    # <<<<<<< even if the template contains nothing it duplicates the entries
+            - ...
         disablePodSecurityPolicy: true
         admissionControl:
             - name: PodSecurity
@@ -78,6 +91,7 @@
                 exemptions:
                     namespaces:
                         - kube-system
+                       - kube-system     #   <<<<<<<<<
                     runtimeClasses: []
                     usernames: []
                 kind: PodSecurityConfiguration

Workaround

Apply "full" (Chart.yaml Setting) configuration with talosctl

talm -n IP -e IP  template -t templates/controlplane.yaml > nodes/mycontrolplane.yaml
talosctl apply-config -f nodes/mycontrolplane.yaml  -n IP -e IP -i

hochbit avatar Sep 16 '25 15:09 hochbit

Hi @hochbit , thanks for raising an issue. I'm not able to reproduce this, could you provide a more complete configuration/example? If you run talm apply -f nodes-cp-2.yaml and then run talm apply -f nodes-cp-2.yaml --dry-run, do you see the same diff?

lllamnyp avatar Sep 16 '25 16:09 lllamnyp

On a PROXMOX VM

  • Secureboot ON
  • EFI BIOS
  • EFI Disk without default certificates (sata) Certificates were installed by the ones on the TalosOS Disk If VM setup is correct - the .der file is in the EFI/keys directory and must be selected in all BIOS entries for CUSTOM Secureboot (EXCEPT DBX! - key retraction/invalidation)
  • CDROM (sata0)
  • Harddisk is SCSI0 (-> sda) (stored externally mounted over nfs store in proxmox)
  • Boot order scsi0/sata0
  • Cloud Init Disk Use proxmox node for everything + net adapter set to IP4 dhcp IPV6 dhcp
  • Network interface vnet ->Setup: https://pve.proxmox.com/wiki/Setup_Simple_Zone_With_SNAT_and_DHCP

Installation image for CD is same as used in config later (nocloud + qemu guest): ce4c980550dd2ab1b17bbf2b08801c7eb59418eafe8f279833297925d67c7515:v1.11.1`

Note to Secureboot (Edit)

I do not think its connected to secureboot, should be the same without

Talm is executed in a docker image of Alpine (Edit 3) NAME="Alpine Linux" ID=alpine VERSION_ID=3.20.1 PRETTY_NAME="Alpine Linux v3.20" HOME_URL="https://alpinelinux.org/" BUG_REPORT_URL="https://gitlab.alpinelinux.org/alpine/aports/-/issues"


mkdir -p ~/talm/example
cd ~/talm/example
talm init --preset generic
# vi Chart.yaml
# -> full: true
talm -n 192.168.0.50 -e 192.168.0.50  template -t templates/controlplane.yaml > nodes/node.yaml 
# vi nodes/node.yaml :
# install.image = factory.talos.dev/nocloud-installer-secureboot/ce4c980550dd2ab1b17bbf2b08801c7eb59418eafe8f279833297925d67c7515:v1.11.1
# RESULT: 0

talosctl apply-config -f nodes/node.yaml  -n 192.168.0.50 -e 192.168.0.50 -i
talm apply -f nodes/node.yaml --dry-run 
# RESULT: 1

talm reset -f nodes/node.yaml --graceful=false
talm apply -f nodes/node.yaml -i
talosctl apply-config -f nodes/node.yaml  -n 192.168.0.50 -e 192.168.0.50 --talosconfig talosconfig --dry-run 
# RESULT: 2

1 (nodes/node.yaml)

# talm: nodes=["192.168.0.50"], endpoints=["192.168.0.50"], templates=["templates/controlplane.yaml"]
# THIS FILE IS AUTOGENERATED. PREFER TEMPLATE EDITS OVER MANUAL ONES.
version: v1alpha1
debug: false
persist: true
machine:
  type: controlplane
  token: 
  ca:
    crt: 
    key: 
  certSANs: []
  kubelet:
    image: ghcr.io/siderolabs/kubelet:v1.33.1
    defaultRuntimeSeccompProfileEnabled: true
    nodeIP:
      validSubnets:
        - 192.168.100.0/24
    disableManifestsDirectory: true
  network:
    hostname: talm-issue-77
    # -- Discovered interfaces:
    # eth0:
    #   hardwareAddr:bc:24:11:ad:65:b7
    #   busPath: 0000:00:12.0
    #   driver: virtio_net
    #   vendor: Red Hat, Inc.
    #   product: Virtio network device)
    interfaces:
      - interface: eth0
        addresses:
          - 192.168.0.50/24
        routes:
          - network: 0.0.0.0/0
            gateway: 192.168.0.1
    nameservers:
      - 192.168.0.1
  install:
    # -- Discovered disks:
    # /dev/sda:
    #    model: QEMU HARDDISK
    #    serial: 
    #    wwid: 
    #    size: 34 GB
    # /dev/sr0:
    #    model: QEMU DVD-ROM
    #    serial: 
    #    wwid: 
    #    size: 206 MB
    # /dev/sr1:
    #    model: QEMU DVD-ROM
    #    serial: 
    #    wwid: 
    #    size: 4.2 MB
    disk: /dev/sda
    wipe: true
    image: factory.talos.dev/nocloud-installer-secureboot/ce4c980550dd2ab1b17bbf2b08801c7eb59418eafe8f279833297925d67c7515:v1.11.1
  features:
    rbac: true
    stableHostname: true
    apidCheckExtKeyUsage: true
    diskQuotaSupport: true
    kubePrism:
      enabled: true
      port: 7445
    hostDNS:
      enabled: true
      forwardKubeDNSToHost: true
  nodeLabels:
    node.kubernetes.io/exclude-from-external-load-balancers: ""
cluster:
  id: 
  secret: 
  controlPlane:
    endpoint: https://192.168.0.50:6443
  clusterName: example
  network:
    dnsDomain: cluster.local
    podSubnets:
      - 10.244.0.0/16
    serviceSubnets:
      - 10.96.0.0/16
  token: 
  secretboxEncryptionSecret: 
  ca:
    crt: 
    key: 
  aggregatorCA:
    crt: 
    key: 
  serviceAccount:
    key: 
  apiServer:
    image: registry.k8s.io/kube-apiserver:v1.33.1
    certSANs:
      - 192.168.0.50
    disablePodSecurityPolicy: true
    admissionControl:
      - name: PodSecurity
        configuration:
          apiVersion: pod-security.admission.config.k8s.io/v1alpha1
          defaults:
            audit: restricted
            audit-version: latest
            enforce: baseline
            enforce-version: latest
            warn: restricted
            warn-version: latest
          exemptions:
            namespaces:
              - kube-system
            runtimeClasses: []
            usernames: []
          kind: PodSecurityConfiguration
    auditPolicy:
      apiVersion: audit.k8s.io/v1
      kind: Policy
      rules:
        - level: Metadata
  controllerManager:
    image: registry.k8s.io/kube-controller-manager:v1.33.1
  proxy:
    image: registry.k8s.io/kube-proxy:v1.33.1
  scheduler:
    image: registry.k8s.io/kube-scheduler:v1.33.1
  discovery:
    enabled: true
    registries:
      kubernetes:
        disabled: true
      service: {}
  etcd:
    ca:
      crt: 
      key: 
    advertisedSubnets:
      - 192.168.100.0/24

2 Diff Output (talm apply -f node/node.yaml --dry-run)

--- a
+++ b
@@ -69,6 +69,7 @@
         image: registry.k8s.io/kube-apiserver:v1.33.1
         certSANs:
             - 192.168.0.50
+            - 192.168.0.50
         disablePodSecurityPolicy: true
         admissionControl:
             - name: PodSecurity
@@ -84,6 +85,7 @@
                 exemptions:
                     namespaces:
                         - kube-system
+                        - kube-system
                     runtimeClasses: []
                     usernames: []
                 kind: PodSecurityConfiguration

3 Diff Output (talosctl apply-config --dry-run)

--- a
+++ b
@@ -69,7 +69,6 @@
         image: registry.k8s.io/kube-apiserver:v1.33.1
         certSANs:
             - 192.168.0.50
-            - 192.168.0.50
         disablePodSecurityPolicy: true
         admissionControl:
             - name: PodSecurity
@@ -85,7 +84,6 @@
                 exemptions:
                     namespaces:
                         - kube-system
-                        - kube-system
                     runtimeClasses: []
                     usernames: []
                 kind: PodSecurityConfiguration

Additional Info (Edit 2)

Its fixable with talosctl - when applying with talosctl - including a reboot it works again. Without reboot it did not work for me either.

hochbit avatar Sep 16 '25 17:09 hochbit

@lllamnyp I think Alpine could be a reason because of the musl libraries. But normally you are using go so.

hochbit avatar Sep 17 '25 12:09 hochbit

Ok its not Alpine - same for me on Ubuntu 24.04 LTS

hochbit avatar Sep 23 '25 18:09 hochbit