calico icon indicating copy to clipboard operation
calico copied to clipboard

Allow setting passive mode for BGP peers

Open danderson opened this issue 7 years ago • 12 comments

In google/metallb#114, I explored how to make my k8s BGP load-balancer interoperate gracefully with Calico clusters that peer with external BGP routers. I've documented my findings at https://master--metallb.netlify.com/configuration/calico/ and https://github.com/google/metallb/issues/114#issuecomment-357547646

Current Behavior

In my setup, I'm trying to peer Calico with another BGP speaker running on localhost. The peer does not listen on any ports, so Calico should just wait for an incoming session. Currently, there is no way to tell Calico to treat a peer passively, so Calico always eagerly tries to connect to 127.0.0.1:179... which is itself. This causes repeated session establishment failures, and BIRD goes into error backoff. This makes it increasingly hard/impossible for the real peer to connect, there's a short window of just a few seconds when the error backoff resets, before the failed connection attempts force it back into backoff.

Expected Behavior

Calico should have a way to specify that a bgpPeer is passive, i.e. Calico should not try to connect to it, but instead just wait for a matching incoming connection.

BIRD supports this, with the passive keyword. It's just not plumbed into the bgpPeer object.

Context

I am trying to make Calico and MetalLB integrate nicely with each other, by setting up a BGP topology like the one I documented for Romana integration. Basically, I want Calico to peer with the outside world, but also with another node agent that pushes routes into Calico for redistribution.

Setting up BGP sessions to/from localhost is notoriously tricky, but with the right set of options, it's possible. Lack of passive mode is one problem I encountered with Calico.

Your Environment

  • Calico version: 2.6.3
  • Orchestrator version (e.g. kubernetes, mesos, rkt): Kubernetes 1.9.1
  • Operating System and version: Debian testing
  • Link to your project (optional): https://github.com/google/metallb

danderson avatar Jan 15 '18 10:01 danderson

Exposing the passive keyword on the BGPPeer resource should be straightforward and seems sensible enough to me.

We'd need to:

caseydavenport avatar Jan 16 '18 23:01 caseydavenport

Hey @danderson, we're revisiting this issue and trying to reproduce the problem (i.e. error backoff due to repeated session establishment failures) using the latest versions of Calico and MetalLB with the minikube setup from the MetalLB tutorial.

The goal is to reproduce this problem first as a validation step before getting the "passive" mode added to bgp peer.

However, using the setup below, I wasn't able to reproduce the error backoff that leads to difficulty with real peer connection establishment.

Your Environment

  • Calico version: v3.5.2
  • Orchestrator version (e.g. kubernetes, mesos, rkt): Kubernetes v1.13.3 using kubeadm
  • Operating System and version: minikube v0.34.1 on Darwin 18.2.0

Cluster Setup

  • single node cluster
  • test-bgp-router (w/ BIRD and Quagga)
  • calico-node
    • with peering to BIRD / Quagga
    • with peering to 127.0.0.1
  • metallb controller
  • metallb speaker
    • with peering to 127.0.0.1

The sequence for the setup was:

  • add test routers (BIRD/Quagga)
  • add calico-node
  • peer calico-node with the routers
  • metallb controller / speaker (no config yet)
  • peer calico-node with 127.0.0.1
  • peer metallb speaker with 127.0.0.1

Before configuring metalb speaker with peering to 127.0.0.1, I had the calico-node peering to 127.0.0.1 to simulate the problem of calico-node peering with itself, i.e. "repeated session establishment failures".

However, I couldn't seem to reproduce an error backoff in the calico-node. Connection retries were evident from the calico-node logs:

...
bird: BGP: Unexpected connect from unknown address 10.0.2.15 (port 10506)
bird: BGP: Unexpected connect from unknown address 10.0.2.15 (port 12421)
2019-03-06 08:31:15.638 [INFO][43] health.go 150: Overall health summary=&health.HealthReport{Live:true, Ready:true}
bird: BGP: Unexpected connect from unknown address 10.0.2.15 (port 1489)
2019-03-06 08:31:17.096 [INFO][43] health.go 150: Overall health summary=&health.HealthReport{Live:true, Ready:true}
bird: BGP: Unexpected connect from unknown address 10.0.2.15 (port 22676)
...

However there was no indication from the logs that BIRD had entered error backoff.

Afterwards, initiating a peering from the metallb speaker resulted in an established connection with calico-node without any issues.

> calicoctl node status
Calico process is running.

IPv4 BGP status
+--------------+---------------+-------+----------+-------------+
| PEER ADDRESS |   PEER TYPE   | STATE |  SINCE   |    INFO     |
+--------------+---------------+-------+----------+-------------+
| 10.96.0.100  | node specific | up    | 07:53:36 | Established |
| 10.96.0.101  | node specific | up    | 07:58:38 | Established |
| 127.0.0.1    | node specific | up    | 08:40:12 | Established |
+--------------+---------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.

Let me know if I'm missing something important in reproducing this.

stevegaossou avatar Mar 06 '19 08:03 stevegaossou

Just a heads up to anyone reading. The most recent relevant discussion is here: https://github.com/google/metallb/issues/114#issuecomment-469985074

stevegaossou avatar Mar 07 '19 19:03 stevegaossou

Any updates on this?

kfox1111 avatar Mar 08 '20 23:03 kfox1111

Really need that functionality.

demonsked avatar Apr 16 '20 16:04 demonsked

I am also interested in this. +1 from me

psavva avatar Jun 05 '20 08:06 psavva

I would also like to see this issue resolved.

stephenstubbs avatar Sep 04 '20 11:09 stephenstubbs

Still looking for someone to work on this! Would love to review.

The first PR to add the new configuration option would look similar to this one: https://github.com/projectcalico/libcalico-go/pull/1262/files

caseydavenport avatar Sep 04 '20 17:09 caseydavenport

Looking at #160, https://github.com/projectcalico/libcalico-go/pull/886 and https://github.com/metallb/metallb/issues/114 it seems there is some way to make this work.

Is this still an issue ?

darkrift avatar Jan 19 '22 13:01 darkrift

@darkrift correct - we've implemented an integration with MetalLB that bypasses the need for explicitly setting passive mode on the BGP peers. So, this issue doesn't block MetalLB integration any more.

We don't yet have an option to set passive mode explicitly per-peer, but the use-case that this was meant to cover works now without it. I've left it open for now in case anyone wants to tackle adding this for another use-case, but the original MetalLB scenario is fixed :+1:

caseydavenport avatar Jan 21 '22 23:01 caseydavenport

@caseydavenport I am interested in this, but I need to know what else needs to be done here, can you tell me? Thanks.

cyclinder avatar Jul 27 '22 10:07 cyclinder

@cyclinder so far as implementing a passive mode option, this is a similar type of PR, and would look similar to what this would entail: https://github.com/projectcalico/calico/pull/5736/

Summary is probably something like this:

  • Update BGPPeer object with a new field to allow setting passive mode (e.g., connectMode: Passive | Active, would need some agreement on the name but shouldn't be too controversial)
  • Plumb the new field through to confd (responsible for writing BIRD templates)
  • Update bird templates to read the new field on each peer, and configure passive on if told to do so.
  • Add a test or two

caseydavenport avatar Jul 27 '22 22:07 caseydavenport