Migrating to Talos 1.12.0-rc.0 Layer2VIPConfig fails with etcd member ips are not subset of control plane node ips

Open rgl opened this issue 1 month ago • 0 comments

Bug Report

Description

While testing the migration from Talos 1.11.5 to Talos 1.12.0-rc.0 Layer2VIPConfig brakes etcd:

waiting for etcd members to be control plane nodes: etcd member ips ["10.17.3.9"] are not subset of control plane node ips ["10.17.3.80"]
healthcheck error: rpc error: code = DeadlineExceeded desc = context deadline exceeded

Please note that the 10.17.3.9 ip is the cluster vip and 10.17.3.80 is the machine ip.

Please note that configuring this the "old way" still works; it only fails when using the new Layer2VIPConfig config object.

Please note that the machine configuration disables the discovery service:

      discovery = {
        enabled = false
        registries = {
          kubernetes = {
            disabled = true
          }
          service = {
            disabled = true
          }
        }
      }

These are the changes made to my terraform configuration that set the machine configuration that are required to migrate to Layer2VIPConfig:

diff --git a/talos.tf b/talos.tf
index 472c9f3..0cfe28c 100644
--- a/talos.tf
+++ b/talos.tf
@@ -126,21 +126,13 @@ data "talos_machine_configuration" "controller" {
   docs               = false
   config_patches = [
     yamlencode(local.common_machine_config),
+    // see https://docs.siderolabs.com/talos/v1.12/networking/advanced/vip
+    // see https://docs.siderolabs.com/talos/v1.12/reference/configuration/network/layer2vipconfig
     yamlencode({
-      machine = {
-        network = {
-          interfaces = [
-            # see https://www.talos.dev/v1.11/talos-guides/network/vip/
-            {
-              interface = "eth0"
-              dhcp      = true
-              vip = {
-                ip = var.cluster_vip
-              }
-            }
-          ]
-        }
-      }
+      apiVersion = "v1alpha1"
+      kind       = "Layer2VIPConfig"
+      link       = "eth0"
+      name       = var.cluster_vip
     }),
     yamlencode({
       cluster = {

Logs

$ talosctl -n $c0 health --control-plane-nodes $controllers 
discovered nodes: ["10.17.3.80"]
waiting for etcd to be healthy: ...
waiting for etcd to be healthy: OK
waiting for etcd members to be consistent across nodes: ...
waiting for etcd members to be consistent across nodes: OK
waiting for etcd members to be control plane nodes: ...
waiting for etcd members to be control plane nodes: etcd member ips ["10.17.3.9"] are not subset of control plane node ips ["10.17.3.80"]
healthcheck error: rpc error: code = DeadlineExceeded desc = context deadline exceeded

Here's some logs about the node addresses:

$ talosctl -n $c0 get members
NODE   NAMESPACE   TYPE   ID   VERSION   HOSTNAME   MACHINE TYPE   OS   ADDRESSES

$ talosctl -n $c0 get addresses
NODE         NAMESPACE   TYPE            ID                                             VERSION   ADDRESS                        LINK
10.17.3.80   network     AddressStatus   cilium_host/10.244.0.160/32                    1         10.244.0.160/32                cilium_host
10.17.3.80   network     AddressStatus   cilium_host/fe80::d87d:64ff:fe0a:eafc/64       2         fe80::d87d:64ff:fe0a:eafc/64   cilium_host
10.17.3.80   network     AddressStatus   cilium_net/fe80::3074:89ff:fe62:c087/64        2         fe80::3074:89ff:fe62:c087/64   cilium_net
10.17.3.80   network     AddressStatus   cilium_vxlan/fe80::d0c8:57ff:feb9:9139/64      2         fe80::d0c8:57ff:feb9:9139/64   cilium_vxlan
10.17.3.80   network     AddressStatus   eth0/10.17.3.80/24                             1         10.17.3.80/24                  eth0
10.17.3.80   network     AddressStatus   eth0/10.17.3.9/32                              1         10.17.3.9/32                   eth0
10.17.3.80   network     AddressStatus   eth0/fe80::5054:ff:fe0a:c22a/64                2         fe80::5054:ff:fe0a:c22a/64     eth0
10.17.3.80   network     AddressStatus   lo/127.0.0.1/8                                 1         127.0.0.1/8                    lo
10.17.3.80   network     AddressStatus   lo/169.254.116.108/32                          1         169.254.116.108/32             lo
10.17.3.80   network     AddressStatus   lo/::1/128                                     1         ::1/128                        lo
10.17.3.80   network     AddressStatus   lxc1dd73342c6e5/fe80::ac0d:d0ff:fe73:cb06/64   2         fe80::ac0d:d0ff:fe73:cb06/64   lxc1dd73342c6e5
10.17.3.80   network     AddressStatus   lxc988170d1db6f/fe80::c4f8:d1ff:fec9:a114/64   2         fe80::c4f8:d1ff:fec9:a114/64   lxc988170d1db6f
10.17.3.80   network     AddressStatus   lxc_health/fe80::80e8:18ff:fec2:1226/64        2         fe80::80e8:18ff:fec2:1226/64   lxc_health

$ talosctl -n $c0 get nodeaddresses
NODE         NAMESPACE   TYPE          ID                      VERSION   ADDRESSES                                            SORTALGORITHM
10.17.3.80   network     NodeAddress   accumulative            4         ["10.17.3.9/32","10.17.3.80/24","10.244.0.160/32"]   v1
10.17.3.80   network     NodeAddress   accumulative-no-k8s     2         ["10.17.3.9/32","10.17.3.80/24"]                     v1
10.17.3.80   network     NodeAddress   accumulative-only-k8s   2         ["10.244.0.160/32"]                                  v1
10.17.3.80   network     NodeAddress   current                 4         ["10.17.3.9/32","10.17.3.80/24","10.244.0.160/32"]   v1
10.17.3.80   network     NodeAddress   current-no-k8s          2         ["10.17.3.9/32","10.17.3.80/24"]                     v1
10.17.3.80   network     NodeAddress   current-only-k8s        2         ["10.244.0.160/32"]                                  v1
10.17.3.80   network     NodeAddress   default                 1         ["10.17.3.80/24"]                                    v1
10.17.3.80   network     NodeAddress   routed                  4         ["10.17.3.9/32","10.17.3.80/24","10.244.0.160/32"]   v1
10.17.3.80   network     NodeAddress   routed-no-k8s           2         ["10.17.3.9/32","10.17.3.80/24"]                     v1
10.17.3.80   network     NodeAddress   routed-only-k8s         2         ["10.244.0.160/32"]                                  v1

Environment

Talos version: v1.12.0-rc.0
Kubernetes version: 1.34.3
Platform: nocloud

Dec 12 '25 20:12 rgl