frr icon indicating copy to clipboard operation
frr copied to clipboard

bgpd: add bgp ipv6-auto-ra command

Open Sokolmish opened this issue 1 year ago • 9 comments

Introduce a command to completely disable sending IPv6 router advertisement messages on an interface. Before, there were cases where it could be enabled by other daemons, namely bgpd and vrrpd. Particularly, it happens when BGP extended-nexthop capability is used: bgpd tells zebra to enable RA in this case, though it wasn't intended.

Fixes #7738.

Sokolmish avatar Jul 07 '24 22:07 Sokolmish

I apologize. I work with an older version of FRR, so I have messed up a bit with syncing changes between it and this PR. Fixed it now.

Sokolmish avatar Jul 07 '24 22:07 Sokolmish

Maybe shouldn't allow sending RAs or at least giving a warning (zlog_warn) to the operator that for example BGP unnumbered won't work if because you have RAs disabled?

Hello. Do you mean that the message should be printed in bgpd in an attempt to add an unnumbered peer? Or on any request to zebra (in zebra_interface_radv_set())?

The first case is similar to what I wanted to do initially: point out all calls from bgpd that enable RA (bgp_zebra_initiate_radv()/zclient_send_interface_radv_req()) and are not related to unnumbered peers, but I wasn't able to determine which calls are related to it and which are not.

The second case is easy, but it seems that it might create some misleading warnings in the case of RA not being intended. E.g., we use the extended-nexthop capability for SRv6 and don't need RA but will still get the warnings. Maybe reduce the level to, say, info?

Sokolmish avatar Jul 08 '24 22:07 Sokolmish

Do you mean that the message should be printed in bgpd in an attempt to add an unnumbered peer?

I'm trying to understand why we should allow sending RAs by any other protocol if we have it explicitly disabled per-interface level?

ton31337 avatar Jul 09 '24 08:07 ton31337

As far as I understood, before this PR there were 2 states: RA explicitly enabled (no suppress-ra) and default state (suppress-ra) overridable by other daemons. So I've added the explicitly disabled state (disable-ra).

Other daemons don't look at the state from vty and just send ZEBRA_INTERFACE_ENABLE_RADV to zebra. Bgpd enables RA in case of extended-nexthop capability being enabled in several places (e.g., on initialization or when new nexthop is got). VRRP enables RA when the router becomes active.

I don't think we should fail these operations because of disabled RA because it isn't the primary thing there: extended-nexthop (and VRRP, I guess) can work without RA. Unnumbered peers most probably won't (I'm not quite familiar with it), and it might be better to fail a command or print a warning in such case, but it will require determining which calls to radv_enable() in bgpd are purely for unnumbered peers, aside from getting the zebra's radv flags from bgpd.

Sokolmish avatar Jul 09 '24 09:07 Sokolmish

I agree we need this feature, I've been burned by this myself before.

However, I'm not sure zebra is the correct place for this, especially after looking at the vrrpd use case.

  • bgpd: enables RA for use by BGP to find the neighbor router. Can be very unexpected and is an abuse of the protocol (should never have been designed like this.) It will be common to have no RA configuration, so the RAs are sent with default parameters.

  • vrrpd: enables RA for the RA function itself, i.e. telling clients. It just limits RA based on the VRRP role, so that only the primary router sends RAs. In this case the RAs will normally be configured by the admin and have correct settings, and they are expected and wanted.

The switch in zebra tries to fix the problem without distinguishing these two cases. It theoretically breaks VRRP use of this function.

⇒ I think this needs to be switched in bgpd, not zebra.

eqvinox avatar Jul 10 '24 07:07 eqvinox

I've added a new option to bgpd. Used the preserve-fw-state option as a template for VTY command and flag handling.

Sokolmish avatar Jul 11 '24 22:07 Sokolmish

I keep getting random topotests failed in modules I haven't touched in the PR (now it's MSDP, before it was OSPF, PIM). Is it the infrastructure problem, or am I missing something?

Sokolmish avatar Jul 15 '24 19:07 Sokolmish

I strongly believe this switch should be moved into bgpd instead.

I think this is done now (?) ... @eqvinox can you take a look to see if this okay?

riw777 avatar Aug 06 '24 12:08 riw777

@eqvinox ping :-)

riw777 avatar Aug 27 '24 14:08 riw777

@eqvinox ping :-)

riw777 avatar Sep 24 '24 13:09 riw777

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions[bot] avatar Sep 24 '24 14:09 github-actions[bot]

@Mergifyio backport dev/10.2 stable/10.1 stable/10.0 stable/9.1 stable/9.0 stable/8.5 stable/8.4

eqvinox avatar Oct 28 '24 12:10 eqvinox

backport dev/10.2 stable/10.1 stable/10.0 stable/9.1 stable/9.0 stable/8.5 stable/8.4

✅ Backports have been created

mergify[bot] avatar Oct 28 '24 12:10 mergify[bot]