kube-vip icon indicating copy to clipboard operation
kube-vip copied to clipboard

kube-vip does not re-add control plane IP to IPVS when kube-apiserver becomes reachable

Open asosso opened this issue 8 months ago • 0 comments

Describe the bug

When the kube-apiserver becomes unreachable, kube-vip correctly removes the node from ipvs. However, it does not re-add the node to the ipvs when kube-apiserver becomes reachable again.

kube-vip-ds-7x95f kube-vip time="2024-06-10T15:34:51Z" level=info msg="failed check k8s server version: Get \"https://10.0.1.19:6443/version?timeout=10s\": dial tcp 10.0.1.19:6443: connect: connection refused"
kube-vip-ds-7x95f kube-vip time="2024-06-10T15:34:51Z" level=info msg="healthCheck failed for backend 10.0.1.19:6443, attempting to remove from load balancer"

To Reproduce

RKE2 working setup:

  • 3 Control plane node: 10.0.1.16, 10.0.1.19,10.0.1.20
  • 1 VIP Address: 10.0.1.10
> ipvsadm -Ln
#[...]
TCP  10.0.1.10:6443 rr
  -> 10.0.1.16:6443           Masq    1      0          0
  -> 10.0.1.19:6443           Masq    1      0          0
  -> 10.0.1.20:6443           Masq    1      0          0
#[...]

The VIP 10.0.1.10 is running from the node 10.0.1.19.

Steps to reproduce the behavior:

  1. Go to 10.0.1.19
  2. Kill the kube-apiserver: killall kube-apiserver
  3. Check the kube-vip-ds logs:
    kube-vip-ds-7x95f kube-vip time="2024-06-10T15:34:51Z" level=info msg="failed check k8s server version: Get \"https://10.0.1.19:6443/version?timeout=10s\": dial tcp 10.0.1.19:6443: connect: connection refused"
    kube-vip-ds-7x95f kube-vip time="2024-06-10T15:34:51Z" level=info msg="healthCheck failed for backend 10.0.1.19:6443, attempting to remove from load balancer"
    
  4. Check the ipvs configuration:
    TCP  10.0.1.10:6443 rr
      -> 10.0.1.16:6443           Masq    1      1          0
      -> 10.0.1.20:6443           Masq    1      0          0
    
  5. Wait for the kube-apiserver container to become available again.
  6. Check the ipvs configuration again:
    TCP  10.0.1.10:6443 rr
      -> 10.0.1.16:6443           Masq    1      1          0
      -> 10.0.1.20:6443           Masq    1      0          0
    

Expected behavior

Kube-vip should re-add automatically the control plane backend when is become available again.

A manual workaround could be:

  1. Delete the kube-vip pod: `kubectl delete pod kube-vip-ds-7x95f``
  2. Check the logs:
    kube-vip-ds-qzzr6 kube-vip time="2024-06-10T15:37:34Z" level=info msg="Added backend for [10.0.1.10:6443] on [10.0.1.19:6443]"
    

Environment (please complete the following information):

  • OS/Distro: Ubuntu 24.04
  • Kubernetes Version: v1.28
  • Kube-vip Version: v0.8.0

Kube-vip.yaml:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  creationTimestamp: null
  labels:
    app.kubernetes.io/name: kube-vip-ds
    app.kubernetes.io/version: v0.8.0
  name: kube-vip-ds
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: kube-vip-ds
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/name: kube-vip-ds
        app.kubernetes.io/version: v0.8.0
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: node-role.kubernetes.io/master
                operator: Exists
            - matchExpressions:
              - key: node-role.kubernetes.io/control-plane
                operator: Exists
      containers:
      - args:
        - manager
        env:
        - name: vip_arp
          value: "true"
        - name: port
          value: "6443"
        - name: vip_nodename
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: vip_interface
          value: eth0
        - name: vip_cidr
          value: "32"
        - name: dns_mode
          value: first
        - name: cp_enable
          value: "true"
        - name: cp_namespace
          value: kube-system
        - name: svc_enable
          value: "true"
        - name: svc_leasename
          value: plndr-svcs-lock
        - name: vip_leaderelection
          value: "true"
        - name: vip_leasename
          value: plndr-cp-lock
        - name: vip_leaseduration
          value: "5"
        - name: vip_renewdeadline
          value: "3"
        - name: vip_retryperiod
          value: "1"
        - name: enable_node_labeling
          value: "true"
        - name: lb_enable
          value: "true"
        - name: lb_port
          value: "6443"
        - name: lb_fwdmethod
          value: masquerade
        - name: address
          value: 10.0.1.10
        - name: prometheus_server
          value: :2112
        image: ghcr.io/kube-vip/kube-vip-iptables:v0.8.0
        imagePullPolicy: IfNotPresent
        name: kube-vip
        resources: {}
        securityContext:
          privileged: true
      hostNetwork: true
      serviceAccountName: kube-vip
      tolerations:
      - effect: NoSchedule
        operator: Exists
      - effect: NoExecute
        operator: Exists
  updateStrategy: {}

asosso avatar Jun 10 '24 15:06 asosso