talos
talos copied to clipboard
Migrating to Talos 1.12.0-rc.0 Layer2VIPConfig fails with etcd member ips are not subset of control plane node ips
Bug Report
Description
While testing the migration from Talos 1.11.5 to Talos 1.12.0-rc.0 Layer2VIPConfig brakes etcd:
waiting for etcd members to be control plane nodes: etcd member ips ["10.17.3.9"] are not subset of control plane node ips ["10.17.3.80"]
healthcheck error: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Please note that the 10.17.3.9 ip is the cluster vip and 10.17.3.80 is the machine ip.
Please note that configuring this the "old way" still works; it only fails when using the new Layer2VIPConfig config object.
Please note that the machine configuration disables the discovery service:
discovery = {
enabled = false
registries = {
kubernetes = {
disabled = true
}
service = {
disabled = true
}
}
}
These are the changes made to my terraform configuration that set the machine configuration that are required to migrate to Layer2VIPConfig:
diff --git a/talos.tf b/talos.tf
index 472c9f3..0cfe28c 100644
--- a/talos.tf
+++ b/talos.tf
@@ -126,21 +126,13 @@ data "talos_machine_configuration" "controller" {
docs = false
config_patches = [
yamlencode(local.common_machine_config),
+ // see https://docs.siderolabs.com/talos/v1.12/networking/advanced/vip
+ // see https://docs.siderolabs.com/talos/v1.12/reference/configuration/network/layer2vipconfig
yamlencode({
- machine = {
- network = {
- interfaces = [
- # see https://www.talos.dev/v1.11/talos-guides/network/vip/
- {
- interface = "eth0"
- dhcp = true
- vip = {
- ip = var.cluster_vip
- }
- }
- ]
- }
- }
+ apiVersion = "v1alpha1"
+ kind = "Layer2VIPConfig"
+ link = "eth0"
+ name = var.cluster_vip
}),
yamlencode({
cluster = {
Logs
$ talosctl -n $c0 health --control-plane-nodes $controllers
discovered nodes: ["10.17.3.80"]
waiting for etcd to be healthy: ...
waiting for etcd to be healthy: OK
waiting for etcd members to be consistent across nodes: ...
waiting for etcd members to be consistent across nodes: OK
waiting for etcd members to be control plane nodes: ...
waiting for etcd members to be control plane nodes: etcd member ips ["10.17.3.9"] are not subset of control plane node ips ["10.17.3.80"]
healthcheck error: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Here's some logs about the node addresses:
$ talosctl -n $c0 get members
NODE NAMESPACE TYPE ID VERSION HOSTNAME MACHINE TYPE OS ADDRESSES
$ talosctl -n $c0 get addresses
NODE NAMESPACE TYPE ID VERSION ADDRESS LINK
10.17.3.80 network AddressStatus cilium_host/10.244.0.160/32 1 10.244.0.160/32 cilium_host
10.17.3.80 network AddressStatus cilium_host/fe80::d87d:64ff:fe0a:eafc/64 2 fe80::d87d:64ff:fe0a:eafc/64 cilium_host
10.17.3.80 network AddressStatus cilium_net/fe80::3074:89ff:fe62:c087/64 2 fe80::3074:89ff:fe62:c087/64 cilium_net
10.17.3.80 network AddressStatus cilium_vxlan/fe80::d0c8:57ff:feb9:9139/64 2 fe80::d0c8:57ff:feb9:9139/64 cilium_vxlan
10.17.3.80 network AddressStatus eth0/10.17.3.80/24 1 10.17.3.80/24 eth0
10.17.3.80 network AddressStatus eth0/10.17.3.9/32 1 10.17.3.9/32 eth0
10.17.3.80 network AddressStatus eth0/fe80::5054:ff:fe0a:c22a/64 2 fe80::5054:ff:fe0a:c22a/64 eth0
10.17.3.80 network AddressStatus lo/127.0.0.1/8 1 127.0.0.1/8 lo
10.17.3.80 network AddressStatus lo/169.254.116.108/32 1 169.254.116.108/32 lo
10.17.3.80 network AddressStatus lo/::1/128 1 ::1/128 lo
10.17.3.80 network AddressStatus lxc1dd73342c6e5/fe80::ac0d:d0ff:fe73:cb06/64 2 fe80::ac0d:d0ff:fe73:cb06/64 lxc1dd73342c6e5
10.17.3.80 network AddressStatus lxc988170d1db6f/fe80::c4f8:d1ff:fec9:a114/64 2 fe80::c4f8:d1ff:fec9:a114/64 lxc988170d1db6f
10.17.3.80 network AddressStatus lxc_health/fe80::80e8:18ff:fec2:1226/64 2 fe80::80e8:18ff:fec2:1226/64 lxc_health
$ talosctl -n $c0 get nodeaddresses
NODE NAMESPACE TYPE ID VERSION ADDRESSES SORTALGORITHM
10.17.3.80 network NodeAddress accumulative 4 ["10.17.3.9/32","10.17.3.80/24","10.244.0.160/32"] v1
10.17.3.80 network NodeAddress accumulative-no-k8s 2 ["10.17.3.9/32","10.17.3.80/24"] v1
10.17.3.80 network NodeAddress accumulative-only-k8s 2 ["10.244.0.160/32"] v1
10.17.3.80 network NodeAddress current 4 ["10.17.3.9/32","10.17.3.80/24","10.244.0.160/32"] v1
10.17.3.80 network NodeAddress current-no-k8s 2 ["10.17.3.9/32","10.17.3.80/24"] v1
10.17.3.80 network NodeAddress current-only-k8s 2 ["10.244.0.160/32"] v1
10.17.3.80 network NodeAddress default 1 ["10.17.3.80/24"] v1
10.17.3.80 network NodeAddress routed 4 ["10.17.3.9/32","10.17.3.80/24","10.244.0.160/32"] v1
10.17.3.80 network NodeAddress routed-no-k8s 2 ["10.17.3.9/32","10.17.3.80/24"] v1
10.17.3.80 network NodeAddress routed-only-k8s 2 ["10.244.0.160/32"] v1
Environment
- Talos version: v1.12.0-rc.0
- Kubernetes version: 1.34.3
- Platform: nocloud