kube-vip
kube-vip copied to clipboard
kube-vip does not re-add control plane IP to IPVS when kube-apiserver becomes reachable
Describe the bug
When the kube-apiserver becomes unreachable, kube-vip correctly removes the node from ipvs
. However, it does not re-add the node to the ipvs when kube-apiserver becomes reachable again.
kube-vip-ds-7x95f kube-vip time="2024-06-10T15:34:51Z" level=info msg="failed check k8s server version: Get \"https://10.0.1.19:6443/version?timeout=10s\": dial tcp 10.0.1.19:6443: connect: connection refused"
kube-vip-ds-7x95f kube-vip time="2024-06-10T15:34:51Z" level=info msg="healthCheck failed for backend 10.0.1.19:6443, attempting to remove from load balancer"
To Reproduce
RKE2 working setup:
- 3 Control plane node: 10.0.1.16, 10.0.1.19,10.0.1.20
- 1 VIP Address: 10.0.1.10
> ipvsadm -Ln
#[...]
TCP 10.0.1.10:6443 rr
-> 10.0.1.16:6443 Masq 1 0 0
-> 10.0.1.19:6443 Masq 1 0 0
-> 10.0.1.20:6443 Masq 1 0 0
#[...]
The VIP 10.0.1.10 is running from the node 10.0.1.19.
Steps to reproduce the behavior:
- Go to 10.0.1.19
- Kill the kube-apiserver:
killall kube-apiserver
- Check the kube-vip-ds logs:
kube-vip-ds-7x95f kube-vip time="2024-06-10T15:34:51Z" level=info msg="failed check k8s server version: Get \"https://10.0.1.19:6443/version?timeout=10s\": dial tcp 10.0.1.19:6443: connect: connection refused" kube-vip-ds-7x95f kube-vip time="2024-06-10T15:34:51Z" level=info msg="healthCheck failed for backend 10.0.1.19:6443, attempting to remove from load balancer"
- Check the ipvs configuration:
TCP 10.0.1.10:6443 rr -> 10.0.1.16:6443 Masq 1 1 0 -> 10.0.1.20:6443 Masq 1 0 0
- Wait for the kube-apiserver container to become available again.
- Check the ipvs configuration again:
TCP 10.0.1.10:6443 rr -> 10.0.1.16:6443 Masq 1 1 0 -> 10.0.1.20:6443 Masq 1 0 0
Expected behavior
Kube-vip should re-add automatically the control plane backend when is become available again.
A manual workaround could be:
- Delete the kube-vip pod: `kubectl delete pod kube-vip-ds-7x95f``
- Check the logs:
kube-vip-ds-qzzr6 kube-vip time="2024-06-10T15:37:34Z" level=info msg="Added backend for [10.0.1.10:6443] on [10.0.1.19:6443]"
Environment (please complete the following information):
- OS/Distro: Ubuntu 24.04
- Kubernetes Version: v1.28
- Kube-vip Version: v0.8.0
Kube-vip.yaml
:
apiVersion: apps/v1
kind: DaemonSet
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/name: kube-vip-ds
app.kubernetes.io/version: v0.8.0
name: kube-vip-ds
namespace: kube-system
spec:
selector:
matchLabels:
app.kubernetes.io/name: kube-vip-ds
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/name: kube-vip-ds
app.kubernetes.io/version: v0.8.0
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-role.kubernetes.io/master
operator: Exists
- matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: Exists
containers:
- args:
- manager
env:
- name: vip_arp
value: "true"
- name: port
value: "6443"
- name: vip_nodename
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: vip_interface
value: eth0
- name: vip_cidr
value: "32"
- name: dns_mode
value: first
- name: cp_enable
value: "true"
- name: cp_namespace
value: kube-system
- name: svc_enable
value: "true"
- name: svc_leasename
value: plndr-svcs-lock
- name: vip_leaderelection
value: "true"
- name: vip_leasename
value: plndr-cp-lock
- name: vip_leaseduration
value: "5"
- name: vip_renewdeadline
value: "3"
- name: vip_retryperiod
value: "1"
- name: enable_node_labeling
value: "true"
- name: lb_enable
value: "true"
- name: lb_port
value: "6443"
- name: lb_fwdmethod
value: masquerade
- name: address
value: 10.0.1.10
- name: prometheus_server
value: :2112
image: ghcr.io/kube-vip/kube-vip-iptables:v0.8.0
imagePullPolicy: IfNotPresent
name: kube-vip
resources: {}
securityContext:
privileged: true
hostNetwork: true
serviceAccountName: kube-vip
tolerations:
- effect: NoSchedule
operator: Exists
- effect: NoExecute
operator: Exists
updateStrategy: {}