calico
calico copied to clipboard
BGP Peering unstable if aggressive timers are used (1s/3s)
I have configured my calico nodes to peer with 2 Top of Rack switches. If the BGP timers are set to 1s KEEPALIVE and 3s HOLD the BGP session are reset (roughly once a day) by the Calico nodes.
Network Traces collected on the K8s Node itself show:
- the KEEPALIVE packet from the switch being received
- BIRD process does not seems to be receiving the KEEPALIVE when the issue happens
- as a consequence BIRD resets the peering.
In the below screenshot, collected on the Kubernetes Node itself we can see the following:
- My switch (.201) sends KEEPALIVES every 1s to Calico (.3)
- Calico will sent a Hold Timer Expired Message resetting the connection
When the reset happens is quite random and enabling debug logs on BIRD is of no use as the POD just seems not to be receiving the packet itself.
I tried to peer with the same pair of switches a VM with goBGP and in that case the connection has been rock solid for days:
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
192.168.2.1 4 65003 89977 89531 21654 0 0 05:29:09 5 <--- Calico on ESXi
192.168.2.2 4 65003 90100 89642 21654 0 0 05:29:09 5 <--- Calico on ESXi
192.168.2.3 4 65003 90001 89561 21654 0 0 07:16:31 5 <--- Calico on ESXi
192.168.2.4 4 65003 89974 89524 21654 0 0 05:29:09 11 <--- Calico on ESXi
192.168.2.5 4 65003 90112 89644 21654 0 0 18:44:09 11 <--- Calico on ESXi
192.168.2.6 4 65003 90101 89641 21654 0 0 01:44:38 5 <--- Calico on ESXi
192.168.2.11 4 65003 692967 689501 21654 0 0 06:11:00 0 <--- Calico on KVM
192.168.2.12 4 65003 692971 689508 21654 0 0 12:16:25 0 <--- Calico on KVM
192.168.2.13 4 65003 692964 689507 21654 0 0 07:55:13 0 <--- Calico on KVM
192.168.2.14 4 65003 693004 689498 21654 0 0 1w1d 0 <--- goBGP on KVM
192.168.2.15 4 65003 693004 689502 21654 0 0 1w1d 0 <--- goBGP on KVM
Expected Behavior
BGP Connection is stable
Current Behavior
BGP Connection is periodically reset
Steps to Reproduce (for bugs)
I can recreate this in my lab at will by simply peering with 2x Top or Rack Swirtches and configure the timer to 1s/3s. I recreated this with BareMetal hosts and VMs on KVM and ESXi
Context
I am writing an integration guide for Calico and BGP based Datacenter Fabrics and peering with 2 switches with 1s/3s timers is the bare minimum to provide high availability and a reasonably fast switch/node failure detection.
Your Environment
I tried multipole version and have multiple environments
- Calico version: I tried with several version including master
- Orchestrator version: Kubernetes v1.23, v1.24, v1.25
- Operating System and version: Ubuntu 20.04 21.04 22.04
Hey @matthewdupre, Any tips on what should be the next step for this?
If the BGP timers are set to 1s/3s the BGP session are reset
Could you say precisely which timers you are configuring here?
If the BGP timers are set to 1s/3s the BGP session are reset
Could you say precisely which timers you are configuring here?
I updated the original description with 1s KEEPALIVE and 3s HOLD
hope that clarifies
@caseydavenport - the timers are being picked up from the upstream router. We need them as the service addresses are being picked up by calico and advertised (anycast) from the nodes that are hosting the service. The upstream router can't tell when a given node goes down (the nodes are VMs) so a chunk of service traffic is routed to a black hole. for 2-3 minutes. Affecting multiple users.
Now aware of an issue at least 5 years old with BIRD (https://bird.network.cz/pipermail/bird-users/2018-June/012461.html) probably related. We've seen this stable at 10/30, not sure where it gets stable, somewhere between 2/6 and 10/30. VM's or bare metal - it is repeatable.
Would you consider a PR turning BFD on? I remember scoping it a really long time ago...
Christopher
@caseydavenport, @matthewdupre, @amit-tigera any update here?
@liljenstolpe sorry about the delay. I think we're open to turning on BFD but are still trying to figure out when we are able to make this happen. I'll keep everyone updated here as more is decided.
We'd be happy if the original request #7366 was reopened!
bfd is, like explained above, really important for failure detection in some scenarios
@beddari I think the plan is to track any BFD work against this issue. Is there some nuance that is missing from this issue that would be better covered in #7366 ?
@mgleung any verdict on having BFD config included in BGP peer CRD for Calico open source? My team is using MetalLB for this specific reason
Sorry @RefluxMeds , I'm currently out of the loop. @caseydavenport might know more.
Hi @caseydavenport, Is there any work being done on this https://github.com/projectcalico/calico/issues/7086 and this https://github.com/projectcalico/calico/issues/4607 issue? My team is really interested in this feature! Are there any blockers for it? Lack of time?
@RefluxMeds unfortunately the Calico team isn't working on this actively right now, but I would be happy to see it. The blocker is just time within the team to pick it up.
Hi @caseydavenport, The feature is not even on a roadmap? i.e. 0 plans to pick it up sometime? Thanks!
My last comment unfortunately still stands. Would be happy to review any PRs though.