talos icon indicating copy to clipboard operation
talos copied to clipboard

Talos service in default namespace keeps old nodes IPs as ready and serving

Open adarmi opened this issue 1 month ago • 3 comments

Bug Report

Description

Hello!

Thanks for this great and secure OS for Kubernetes.

I'm currently evaluating Talos Linux with Cluster API in a vSphere environment. After some rolling updates, when I checked the talos service and the endpointslides associated, old node IPs were not removed and are still declared "ready" and "serving" (with terminating as false).

I discovered the issue because after installing the talos-backup the job kept failing due to the high number of old controlplane nodes.

Thank you!

Logs

Endpointslices associated to the talos service:

k get endpointslices.discovery.k8s.io talos-ipv4 -o yaml
addressType: IPv4
apiVersion: discovery.k8s.io/v1
endpoints:
- addresses:
  - 192.168.1.191
  conditions:
    ready: true
    serving: true
    terminating: false
- addresses:
  - 192.168.1.63
  conditions:
    ready: true
    serving: true
    terminating: false
- addresses:
  - 192.168.1.192
  conditions:
    ready: true
    serving: true
    terminating: false
- addresses:
  - 192.168.1.64
  conditions:
    ready: true
    serving: true
    terminating: false
- addresses:
  - 192.168.1.193
  conditions:
    ready: true
    serving: true
    terminating: false
- addresses:
  - 192.168.1.194
  conditions:
    ready: true
    serving: true
    terminating: false
- addresses:
  - 192.168.1.95
  conditions:
    ready: true
    serving: true
    terminating: false
[...]
kind: EndpointSlice
[...]

The controlplane nodes:

 k get nodes -o wide
NAME                   STATUS   ROLES           AGE    VERSION   INTERNAL-IP     EXTERNAL-IP     OS-IMAGE          KERNEL-VERSION   CONTAINER-RUNTIME
controlplane-fvbtq         Ready    control-plane   78m    v1.34.0   192.168.1.95    192.168.1.95    Talos (v1.11.5)   6.12.57-talos    containerd://2.1.5
controlplane-nnqcd         Ready    control-plane   86m    v1.34.0   192.168.1.193   192.168.1.193   Talos (v1.11.5)   6.12.57-talos    containerd://2.1.5
controlplane-p4bdg         Ready    control-plane   82m    v1.34.0   192.168.1.194   192.168.1.194   Talos (v1.11.5)   6.12.57-talos    containerd://2.1.5

Environment

  • Talos version: v1.11.5
  • Kubernetes version: 1.34.0-1.34.2
  • Platform: Linux

adarmi avatar Dec 12 '25 15:12 adarmi

Talos Linux relies on discovery service data to populate the list.

If you re-use same secrets for a cluster which is changing IPs of some nodes, you might hit state discovery data, which will automatically disappear in around ~30 mins.

You can use talosctl get members to look further into it.

smira avatar Dec 12 '25 15:12 smira

All right, thanks for your quick reply, the process of secrets bootstrap should be handled by Cluster API and Talos Controlplane or Bootstrap provider, correct?

talosctl get members returns the same result of nodes than the k get nodes command.

adarmi avatar Dec 12 '25 16:12 adarmi

If talosctl get members returns same result, grab a support bundle (talosctl support) and create a bug report please.

smira avatar Dec 12 '25 16:12 smira