kubespray
kubespray copied to clipboard
VXLAN tunnel dropped with bird bgp calico network backend
What happened?
Hello, I use calico cni with bird backend and "peer with router" architecture. Unfortunately my provider hardware network bandwidth is pretty limited so to avoid throttle by network qos, I need to use VXLAN full mesh tunnels to route kubernetes internal traffic on l2 network without bounce on my gateway.
Finally, I have calico-node with bgp peer with router (for loadbalancing with help of metallb and cross projects/regions routing) and vxlan full mesh tunnels for internal communications.
It works pretty well, the issue is when i re-run kubspray role network calico task "Configure node asNumber for per node peering" execute a calicoctl apply node.projectcalico.org/v3.
calicoctl apply replace resource if exists:
However, resources node.projectcalico.org/v3 contains informations feed by calico-node when ippool specify vxlan:
pod workload ippool:
apiVersion: crd.projectcalico.org/v1
kind: IPPool
metadata:
name: default-pool
spec:
allowedUses:
- Workload
blockSize: 24
cidr: 10.40.0.0/16
ipipMode: Never
natOutgoing: true
nodeSelector: all()
vxlanMode: Always
vxlan tunnel ippool
apiVersion: crd.projectcalico.org/v1
kind: IPPool
metadata:
name: vxlan-10.3.240.0-20
spec:
allowedUses:
- Tunnel
blockSize: 32
cidr: 10.3.240.0/20
ipipMode: Never
nodeSelector: all()
vxlanMode: Always
with this configuration calico will add an interface vxlan.calico with ip in 10.3.240.0/20 range to create vxlan tunnels for example on master-0:
this ip is add to node.projectcalico.org/v3 master-0:
which permit calico-node on other nodes to create proper vxlan for corresponding podCIDRs allocated to this node:
Now you could understand that when i re-run kubespray task all node.projectcalico.org/v3 fields ipv4VXLANTunnelAddr are removed, which trigger calico-node to remove vxlan tunnel, all my network traffic is also reroute through bgp gateway which is under qos and I encounter network bandwidth issues.
To fix this I currenty have to restart calico-node pods but time during all vxlan tunnel are up my cluster is under pressure, and i loss a lot of paquets due to openstack qos.
What did you expect to happen?
I expect that kubespray don't use calicoctl apply on already existing resources but calicoctl patch instead to avoid erased of fields not managed by kubespray.
In this way vxlan tunnel should'nt be removed.
NB: if I remove task I haven't trouble, but obviously I need this task to create new nodes.
How can we reproduce it (as minimally and precisely as possible)?
Create a k8s cluster with calico cni and vxlan fullmesh and bgp peer with router. re-run kubespray network tags and observe your vxlan tunnels drop.
OS
# printf "$(uname -srm)\n$(cat /etc/os-release)\n"
Linux 5.15.0-102-generic x86_64
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
Version of Ansible
ansible [core 2.16.5]
python version = 3.11.8 (main, Feb 12 2024, 14:50:05) [GCC 13.2.1 20230801] (/usr/bin/python)
jinja version = 3.1.3
Version of Python
Python 3.11.8
Version of Kubespray (commit)
1b870a186238816822cd98ecd48e1f89320160e2
Network plugin used
calico
Full inventory with variables
dynamic inventory too much inventory variables but not interesting in this case
Command used to invoke ansible
ansible-playbook -i openstack.yaml kubespray/cluster.yml -b --tags network
Output of ansible run
No errors.
Anything else we need to know
My idea to troubleshoot this behavior is to add task to get node then if node exist do a calicoctl patch and if node doesn't exist do a calicoctl apply, it seem to work I finish to setup these tasks and offer a PR regarding this case.