frr
frr copied to clipboard
zebra: check multipath before using recursively resolved blackhole nh
In case a static route is resolving over multiple nexthops (multipath), if the first nexthop in the list is recursively resolving over a blackhole, zebra does not bother looking at other nexthops and sends a netlink update to the kernel marking the route as a blackhole. This results in complete traffic blackholing even in a multipath setup.
If number of nexthops > 1, do not bother checking for recursively resolved blackhole nexthop. Let zebra's multipath handling take over.
I really don't understand how this can work. It's not currently possible to have a ecmp path which contains a blackhole as far as my understanding goes. Can I get a example that shows the problem you are solving?
Baseline Configuration:
S>* 30.0.0.0/28 [251/0] via 192.168.12.2, eth3, weight 1, portion 0.333333, 00:00:24
* via 192.168.13.2, eth1, weight 1, portion 0.333333, 00:00:24
* via 192.168.14.2, eth2, weight 1, portion 0.333333, 00:00:24
S>* 192.168.0.0/16 [1/0] unreachable (blackhole), weight 1, portion 1.000000, 1d16h05m
C>* 192.168.12.0/24 is directly connected, eth3, portion 0.000000, 00:00:24
C>* 192.168.13.0/24 is directly connected, eth1, portion 0.000000, 1d17h41m
C>* 192.168.14.0/24 is directly connected, eth2, portion 0.000000, 1d19h21m
In the configuration above, if eth3 goes down, 30.0.0.0/28 will resolve recursively over 192.168.0.0/16 (blackhole nexthop) due to longest prefix matching.
S> 30.0.0.0/28 [1/0] via 192.168.12.2 (recursive), weight 1, 01:04:20
* unreachable (blackhole), weight 1, portion 0.333333, 01:04:20
* via 192.168.13.2, eth1, weight 1, portion 0.333333, 01:04:20
* via 192.168.14.2, eth2, weight 1, portion 0.333333, 01:04:20
S>* 192.168.0.0/16 [1/0] unreachable (blackhole), weight 1, portion 1.000000, 01:05:14
C>* 192.168.13.0/24 is directly connected, eth1, portion 0.000000, 01:05:14
C>* 192.168.14.0/24 is directly connected, eth2, portion 0.000000, 01:05:14
routes in kernel: (note how 30.0.0.0/28 is marked as a blackhole)
blackhole 30.0.0.0/28 proto 196 metric 20
blackhole 192.168.0.0/16 proto 196 metric 20
192.168.13.0/24 dev eth1 proto kernel scope link src 192.168.13.1
192.168.14.0/24 dev eth2 proto kernel scope link src 192.168.14.1
This is because 192.168.12.2 is first in the list of nexthops.
Compare this to a case where eth2 is brought down instead. Ignore the rejected
tag for now. That is due to a bug in the netlink update from zebra to kernel (https://github.com/FRRouting/frr/pull/14613).
S>r 30.0.0.8/28 [251/0] via 192.168.12.2, eth2, weight 1, portion 0.333333, 00:00:02
r via 192.168.13.2, eth1, weight 1, portion 0.333333, 00:00:02
r via 192.168.14.2 (recursive), weight 1, 00:00:02
r unreachable (blackhole), weight 1, portion 0.333333, 00:00:02
routes in kernel:
30.0.0.0/28 proto 196 metric 20
nexthop via 192.168.12.2 dev eth3 weight 240
nexthop via 192.168.13.2 dev eth1 weight 240
nexthop via 192.168.14.2 dev eth2 weight 240 dead linkdown
There is a discrepancy in how zebra treats a route with a recursively resolved blackhole nexthop simply based on the order that the nexthops are processed in. (smaller IP is earlier in the list).
This fix is restoring the nexthop_num == 1
condition for processing blackhole nexthops. This was changed after https://github.com/FRRouting/frr/commit/4be03ff4cae4322f6f507a1d84224603e302f146.
@ton31337 can I get a review on the revised PR. Thanks!
IMO zebra should not allow a ecmp path that has a normal
nexthop and a blackhole
nexthop. This should be stopped higher up in zebra and make the route unresolvable.
staticd should also probably be changed to notice that one of it's paths is unreachable and make an appropriate choice about the route. I still need to think about this a bit more too though
@bhinin I was thinking about this some more and perhaps I am missing something. Why do you need a ecmp route where one of the nexthops is a blackhole? Perhaps I am missing the use case as that I cannot think of any way that this is useful in routing. Can you take a step back and give us the requirements of what you are trying to do?
@donaldsharp, I don't need one of the paths to resolve over blackhole. It's occurring because of Longest Prefix Match over a blackhole route - that is a side effect of the aggregate blackhole route. See config below:
S>* 30.0.0.0/28 [251/0] via 192.168.12.2, eth3, weight 1, portion 0.333333, 00:00:24
* via 192.168.13.2, eth1, weight 1, portion 0.333333, 00:00:24
* via 192.168.14.2, eth2, weight 1, portion 0.333333, 00:00:24
S>* 192.168.0.0/16 [1/0] unreachable (blackhole), weight 1, portion 1.000000, 1d16h05m
The configuration of static routes above is done using GW only nexthops:
ip route 30.0.0.0/28 192.168.12.2
ip route 30.0.0.0/28 192.168.13.2
ip route 30.0.0.0/28 192.168.14.2
interface eth1
ip address 192.168.13.1/31
interface eth2
ip address 192.168.14.1/31
interface eth3
ip address 192.168.12.1/31
When processing the paths, zebra checks if the very first nexthop is resolved over a blackhole. If yes, then the route to 30.0.0.0/28 will be marked as a blackhole.
But there is an inconsistency here based on which interface goes down.
- If eth3 goes down, since 192.168.12.2 is the first nexthop and is now recursively resolving over blackhole, zebra will mark 30.0.0.0/28 as a blackhole.
- If eth2 goes down, zebra checks the first nexthop (192.168.12.2) sees that it is not resolving over blackhole and lets the multipath logic take over.
This code change addresses that inconsistency where if multipath is present, don't simply look at the first nexthop. Look at all.
That is not my question, I understand this part. It's the simple mechanics of how we are getting into this state. What is the meta problem that is causing you to get into this state? WHY do you want a route that has ecmp where one or more of the nexthops is a blackhole and one or more of the nexthops are normal
? What is the problem that is being solved?
- I need ecmp, hence the static configuration using multiple GWs configured on interfaces.
- The device learns a default route from a neighbor.
- I need the blackhole route to advertise an aggregate route out to other neighbors over BGP. I could achieve this by NOT configuring it statically, and simply advertising it using
router bgp > network 192.168.0.0
andno bgp network import-check
. But with this, if one of the links in (1) were to go down, that path will now recursively resolve over the default route learned in (2) which I want to avoid. I just want that one path to be removed.
I fixed the problem by using GW + IF nexthop
in the static route configuration. That way when the link goes down, that nexthop is removed.
@donaldsharp, following up on this. Even with this being fixed higher up in zebra or staticd, I believe the fix here is required as zebra is treating nexthops differently based on where it is in the list.
with my proposed changes the dplane part of zebra will never see this route. It's a moot point. Like I said we need to fix both staticd and zebra main processing pthread to do the right thing.
This PR is stale because it has been open 180 days with no activity. Comment or remove the autoclose
label in order to avoid having this PR closed.