frr icon indicating copy to clipboard operation
frr copied to clipboard

zebra: check multipath before using recursively resolved blackhole nh

Open bhinin opened this issue 1 year ago • 13 comments

In case a static route is resolving over multiple nexthops (multipath), if the first nexthop in the list is recursively resolving over a blackhole, zebra does not bother looking at other nexthops and sends a netlink update to the kernel marking the route as a blackhole. This results in complete traffic blackholing even in a multipath setup.

If number of nexthops > 1, do not bother checking for recursively resolved blackhole nexthop. Let zebra's multipath handling take over.

bhinin avatar Oct 17 '23 00:10 bhinin

I really don't understand how this can work. It's not currently possible to have a ecmp path which contains a blackhole as far as my understanding goes. Can I get a example that shows the problem you are solving?

donaldsharp avatar Oct 17 '23 14:10 donaldsharp

Baseline Configuration:

S>* 30.0.0.0/28 [251/0] via 192.168.12.2, eth3, weight 1, portion 0.333333, 00:00:24
  *                      via 192.168.13.2, eth1, weight 1, portion 0.333333, 00:00:24
  *                      via 192.168.14.2, eth2, weight 1, portion 0.333333, 00:00:24
S>* 192.168.0.0/16 [1/0] unreachable (blackhole), weight 1, portion 1.000000, 1d16h05m
C>* 192.168.12.0/24 is directly connected, eth3, portion 0.000000, 00:00:24
C>* 192.168.13.0/24 is directly connected, eth1, portion 0.000000, 1d17h41m
C>* 192.168.14.0/24 is directly connected, eth2, portion 0.000000, 1d19h21m

In the configuration above, if eth3 goes down, 30.0.0.0/28 will resolve recursively over 192.168.0.0/16 (blackhole nexthop) due to longest prefix matching.

S>  30.0.0.0/28 [1/0] via 192.168.12.2 (recursive), weight 1, 01:04:20
  *                     unreachable (blackhole), weight 1, portion 0.333333, 01:04:20
  *                   via 192.168.13.2, eth1, weight 1, portion 0.333333, 01:04:20
  *                   via 192.168.14.2, eth2, weight 1, portion 0.333333, 01:04:20
S>* 192.168.0.0/16 [1/0] unreachable (blackhole), weight 1, portion 1.000000, 01:05:14
C>* 192.168.13.0/24 is directly connected, eth1, portion 0.000000, 01:05:14
C>* 192.168.14.0/24 is directly connected, eth2, portion 0.000000, 01:05:14

routes in kernel: (note how 30.0.0.0/28 is marked as a blackhole)

blackhole 30.0.0.0/28 proto 196 metric 20
blackhole 192.168.0.0/16 proto 196 metric 20
192.168.13.0/24 dev eth1 proto kernel scope link src 192.168.13.1
192.168.14.0/24 dev eth2 proto kernel scope link src 192.168.14.1

This is because 192.168.12.2 is first in the list of nexthops.

Compare this to a case where eth2 is brought down instead. Ignore the rejected tag for now. That is due to a bug in the netlink update from zebra to kernel (https://github.com/FRRouting/frr/pull/14613).

S>r 30.0.0.8/28 [251/0] via 192.168.12.2, eth2, weight 1, portion 0.333333, 00:00:02
  r                      via 192.168.13.2, eth1, weight 1, portion 0.333333, 00:00:02
  r                      via 192.168.14.2 (recursive), weight 1, 00:00:02
  r                        unreachable (blackhole), weight 1, portion 0.333333, 00:00:02

routes in kernel:

30.0.0.0/28 proto 196 metric 20
        nexthop via 192.168.12.2 dev eth3 weight 240
        nexthop via 192.168.13.2 dev eth1 weight 240
        nexthop via 192.168.14.2 dev eth2 weight 240 dead linkdown

There is a discrepancy in how zebra treats a route with a recursively resolved blackhole nexthop simply based on the order that the nexthops are processed in. (smaller IP is earlier in the list).

bhinin avatar Oct 17 '23 17:10 bhinin

This fix is restoring the nexthop_num == 1 condition for processing blackhole nexthops. This was changed after https://github.com/FRRouting/frr/commit/4be03ff4cae4322f6f507a1d84224603e302f146.

bhinin avatar Oct 17 '23 17:10 bhinin

@ton31337 can I get a review on the revised PR. Thanks!

bhinin avatar Oct 24 '23 15:10 bhinin

IMO zebra should not allow a ecmp path that has a normal nexthop and a blackhole nexthop. This should be stopped higher up in zebra and make the route unresolvable.

donaldsharp avatar Oct 31 '23 16:10 donaldsharp

staticd should also probably be changed to notice that one of it's paths is unreachable and make an appropriate choice about the route. I still need to think about this a bit more too though

donaldsharp avatar Oct 31 '23 16:10 donaldsharp

@bhinin I was thinking about this some more and perhaps I am missing something. Why do you need a ecmp route where one of the nexthops is a blackhole? Perhaps I am missing the use case as that I cannot think of any way that this is useful in routing. Can you take a step back and give us the requirements of what you are trying to do?

donaldsharp avatar Oct 31 '23 19:10 donaldsharp

@donaldsharp, I don't need one of the paths to resolve over blackhole. It's occurring because of Longest Prefix Match over a blackhole route - that is a side effect of the aggregate blackhole route. See config below:

S>* 30.0.0.0/28 [251/0] via 192.168.12.2, eth3, weight 1, portion 0.333333, 00:00:24
  *                      via 192.168.13.2, eth1, weight 1, portion 0.333333, 00:00:24
  *                      via 192.168.14.2, eth2, weight 1, portion 0.333333, 00:00:24
S>* 192.168.0.0/16 [1/0] unreachable (blackhole), weight 1, portion 1.000000, 1d16h05m

The configuration of static routes above is done using GW only nexthops:

ip route 30.0.0.0/28 192.168.12.2
ip route 30.0.0.0/28 192.168.13.2
ip route 30.0.0.0/28 192.168.14.2

interface eth1
    ip address 192.168.13.1/31

interface eth2
    ip address 192.168.14.1/31

interface eth3
    ip address 192.168.12.1/31

When processing the paths, zebra checks if the very first nexthop is resolved over a blackhole. If yes, then the route to 30.0.0.0/28 will be marked as a blackhole.

But there is an inconsistency here based on which interface goes down.

  1. If eth3 goes down, since 192.168.12.2 is the first nexthop and is now recursively resolving over blackhole, zebra will mark 30.0.0.0/28 as a blackhole.
  2. If eth2 goes down, zebra checks the first nexthop (192.168.12.2) sees that it is not resolving over blackhole and lets the multipath logic take over.

This code change addresses that inconsistency where if multipath is present, don't simply look at the first nexthop. Look at all.

bhinin avatar Oct 31 '23 20:10 bhinin

That is not my question, I understand this part. It's the simple mechanics of how we are getting into this state. What is the meta problem that is causing you to get into this state? WHY do you want a route that has ecmp where one or more of the nexthops is a blackhole and one or more of the nexthops are normal? What is the problem that is being solved?

donaldsharp avatar Oct 31 '23 20:10 donaldsharp

  1. I need ecmp, hence the static configuration using multiple GWs configured on interfaces.
  2. The device learns a default route from a neighbor.
  3. I need the blackhole route to advertise an aggregate route out to other neighbors over BGP. I could achieve this by NOT configuring it statically, and simply advertising it using router bgp > network 192.168.0.0 and no bgp network import-check. But with this, if one of the links in (1) were to go down, that path will now recursively resolve over the default route learned in (2) which I want to avoid. I just want that one path to be removed.

I fixed the problem by using GW + IF nexthop in the static route configuration. That way when the link goes down, that nexthop is removed.

bhinin avatar Oct 31 '23 21:10 bhinin

@donaldsharp, following up on this. Even with this being fixed higher up in zebra or staticd, I believe the fix here is required as zebra is treating nexthops differently based on where it is in the list.

bhinin avatar Nov 13 '23 23:11 bhinin

with my proposed changes the dplane part of zebra will never see this route. It's a moot point. Like I said we need to fix both staticd and zebra main processing pthread to do the right thing.

donaldsharp avatar Nov 13 '23 23:11 donaldsharp

This PR is stale because it has been open 180 days with no activity. Comment or remove the autoclose label in order to avoid having this PR closed.

github-actions[bot] avatar May 12 '24 01:05 github-actions[bot]