ovn-kubernetes
ovn-kubernetes copied to clipboard
node deletion results stale lsps and IP leaking on layer2/localnet networks
What happened?
when a node is deleted, the pods scheduled on it are also deleted. But the pods' associated logical switch ports stay in the ovn-db and Pod IPs are not released either.
What did you expect to happen?
we expected that when any pod scheduled on the deleted node is deleted, its assoicated logical switch port would be deleted and pod's IP would be released.
How can we reproduce it (as minimally and precisely as possible)?
in non-IC case, create a pod with localnet network on a node, then directly delete that node.
Anything else we need to know?
the problem is that when a pod is deleted, ovnkube-controller checks if the pod is scheduled on the local node, but at that time, the node deletion handler has already deleted the node from the localNode cache. The check fails and nothing is done for that pod, and leave the lsps behind.
OVN-Kubernetes version
downstream ovn-kubernetes based on upstream commit up to a5ef4eeede2. but I believe the issue still exists in the current upstream code.
Kubernetes version
N/A
OVN version
$ oc rsh -n ovn-kubernetes ovnkube-node-xxxxx (pick any ovnkube-node pod on your cluster)
$ rpm -q ovn
# paste output here
OVS version
$ oc rsh -n ovn-kubernetes ovs-node-xxxxx (pick any ovs pod on your cluster)
$ rpm -q openvswitch
# paste output here
Platform
Is it baremetal? GCP? AWS? Azure?
OS version
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here