calico
calico copied to clipboard
calico 3.18 chooses the VIP for vxlan encap
We are using calico 3.18 and using "autodetection" for the IP address. The node has 2 ip addresses on eth0. 1st is from DHCP (e.g. 10.0.0.2/24). 2nd if from a VIP. we are using kube-vip. (e.g. 10.0.0.10/32)
If an interface goes up/down, calico will recreate the vxlan interface. because we are using autodetect, and the default autodetection method is "first-found", the vxlan can be recreated with the VIP as the encap ip address. This works fine in a steady state. However, during an upgrade, the VIP moves to the new control plane. Once the VIP moves, calico is not recalculating the IP address that should be used for the vxlan encap on the old control plane. Instead, it continues to use the VIP for encap on the old control plane. because the VIP no longer exists on the old control plane, pod to pod and pod to host communication is broken between the 2 control planes and the upgrade stalls.
As a workaround, we can bring the interface up/down and force the upgrade to continue.
Expected Behavior
I would expect calico to recalculate/reconfigure the ip address it should use for vxlan encap once the VIP moves.
Ideally, calico would recalculate the ip address used for encap when the VIP is moved.
I do understand that it is possible this is fixed in more recent calico version. 3.18 is fairly old. And it looks like calico 3.20 has other ways to prevent this problem in the first plance (ipautodetection=k8s-internal-ip), However, we can't move to 3.20 right now. Maybe in our next release. We have customer on 3.18 and we need to find a way to upgrade them.
One workaround we are thinking is to force calico to use the node ip (i.e. 10.0.0.2/24 in the example), and never use the VIP. We are thinking of setting the IP env var to status.hostIP.
Will the status.hostIP in daemonset spec ever be the VIP?
Current Behavior
Currently in 3.18, when the VIP moves to the new control plane, the old control plane is still using the VIP for vxlan encap.
calico 3.18 continues to use the VIP for vxlan encap even though it was no longer present on the machine. it should have gotten an address change notification an updated the vxlan interface
Possible Solution
since this is on 3.18, i'm not looking for a fix. And I know 3.20 has a fix (ip_autodetection=k8s-internal-ip). I'm looking for a workaround to ensure that we can upgrade.
Steps to Reproduce (for bugs)
- k8s 1.23.5 with calico 3.18
- install kube-vip
- bring the interface up/down. check the vxlan encap. it could change to the VIP depending on which ip address is "first-found"
- upgrade. this will hang because the pod to pod and pod to host communication is broken because old control plane is using VIP for encap.
Context
This is blocking our upgrades.
Your Environment
- Calico version: 3.18
- Orchestrator version (e.g. kubernetes, mesos, rkt): kubernetes
- Operating System and version: mariner 1.0
- Link to your project (optional):
The main thing we are looking for is a workaround in calico 3.18 that will force the vxlan interface to use the 10.0.0.0/22 ip address
Given that your VIP is a /32 address could you change your autodetection method to "can-reach" or "cidr" in an attempt to force the autodetection to skip the VIP address? Examples to try might be "can-reach=10.0.0.1" or "cidr=10.0.0.2/24".
https://projectcalico.docs.tigera.io/archive/v3.18/reference/node/configuration#ip-autodetection-methods
Closing unless more info provided.