frr
frr copied to clipboard
NHG : proto NHG modification or Zebra NHG modification issues with protocol NHG routes
Describe the bug
This is a three-step scenario, where multiple problems are identified. step1 : I create a nexthop-group interface with its route in sharpd, The expectation is that the created protocol NHG is created with the route. step 2 : if I add a route-map in zebra to change the src attribute, the route is not refreshed. The expectation is that a new NHG is duped from the protocol NHG with a nexthop that contains the src attribute This step fails today. step 3 : If I modify the original nexthop group, if a new NHG had been allocated in the previous step, then the modification fails to update the changes from the protocol level.
To Reproduce step 1:
interface loop1
ip address 192.168.0.1/24
exit
interface loop2
ip address 192.168.2.1/26
exit
ip route 1.1.1.1/32 loop1
debu zebra events
debu zebra kernel msgdump send
log stdout debugging
nexthop-group test
nexthop 1.1.1.1 loop1 onlink
nexthop 1.1.1.2 loop2 onlink
exit
exit
sharp install routes 5.5.5.5 nexthop-group test 1
ubuntu2204# show ip route 5.5.5.5 nexthop-group
Routing entry for 5.5.5.5/32
Known via "sharp", distance 150, metric 0, best
Last update 00:00:21 ago
Nexthop Group ID: 181818168
* 1.1.1.1, via loop1 onlink, weight 1
* 1.1.1.2, via loop2 onlink, weight 1
linux# ip ro show
5.5.5.5 nhid 181818168 proto 194 metric 20
nexthop via 1.1.1.1 dev loop1 weight 1 onlink
nexthop via 1.1.1.2 dev loop2 weight 1 onlink
step 2:
route-map NH-SRC permit 111
set src 192.168.0.1
exit
ip protocol sharp route-map NH-SRC
ubuntu2204# show ip route 5.5.5.5 nexthop-group
Routing entry for 5.5.5.5/32
Known via "sharp", distance 150, metric 0, best
Last update 00:01:44 ago
Nexthop Group ID: 181818168
* 1.1.1.1, via loop1 onlink, weight 1
* 1.1.1.2, via loop2 onlink, weight 1
linux# ip ro show
5.5.5.5 nhid 181818168 proto 194 metric 20
nexthop via 1.1.1.1 dev loop1 weight 1 onlink
nexthop via 1.1.1.2 dev loop2 weight 1 onlink
step 3:
nexthop-group test
no nexthop 1.1.1.1 loop1 onlink
Expected behavior A new NHG id is allocated in step 2
route-map NH-SRC permit 111
set src 192.168.0.1
exit
ip protocol sharp route-map NH-SRC
ubuntu2204# show ip route 5.5.5.5 nexthop-group
Routing entry for 5.5.5.5/32
Known via "sharp", distance 150, metric 0, best
Last update 00:01:44 ago
Nexthop Group ID: 532
* 1.1.1.1, via loop1 onlink, weight 1
* 1.1.1.2, via loop2 onlink, weight 1
linux# ip ro show
5.5.5.5 nhid 532 proto 194 src 192.168.0.1 metric 20
nexthop via 1.1.1.1 dev loop1 weight 1 onlink
nexthop via 1.1.1.2 dev loop2 weight 1 onlink
The change is propagated in the new NHG in step 2. A new NHG is allocated for step 3:
ubuntu2204# show ip route 5.5.5.5 nexthop-group
Routing entry for 5.5.5.5/32
Known via "sharp", distance 150, metric 0, best
Last update 00:01:44 ago
Nexthop Group ID: 533
* 1.1.1.2, via loop2 onlink, weight 1
linux# ip ro show
5.5.5.5 nhid 533 proto 194 src 192.168.0.1 metric 20
nexthop via 1.1.1.2 dev loop2 weight 1 onlink
Screenshots
Versions recent version ubuntu-22-04. Additional context
The issue is a limitation related protocol NHGs. Meaning that when bgp PIC will be made available, it is highly likely that the 'ip protocol bgp route-map' will have some severe flaws in relation with bgp nexthops updates.
An attempt has been tried to gather all NHGs that depend from the same NHG protocol in that branch:
https://github.com/pguibert6WIND/frr/tree/nhg_dependency_issue_15169
Unfortunately, there are some side effect with all_protocol_startup... So set it to on hold.
with pull request, the route-map test fails:
when applying follow patch over https://github.com/FRRouting/frr/pull/14973
@@ -1453,10 +1453,16 @@ def test_nexthop_groups_with_route_maps():
net["r1"].cmd(
'vtysh -c "c t" -c "route-map NH-SRC permit 111" -c "set src %s"' % src_str
)
- net["r1"].cmd('vtysh -c "c t" -c "ip protocol sharp route-map NH-SRC"')
-
net["r1"].cmd('vtysh -c "sharp install routes %s nexthop-group test 1"' % route_str)
+ nhg_id = route_get_nhg_id("%s/32" % route_str)
+ print(f"XXXXXXXXXX NH ID is {nhg_id} before route-map")
+ verify_route_nexthop_group("%s/32" % route_str)
+
+ net["r1"].cmd('vtysh -c "c t" -c "ip protocol sharp route-map NH-SRC"')
+
+ nhg_id = route_get_nhg_id("%s/32" % route_str)
+ print(f"XXXXXXXXXX NH ID is {nhg_id} after route-map")
verify_route_nexthop_group("%s/32" % route_str)
# Only a valid test on linux using nexthop objects
--
2.34.1
I obtain the same NHID before and after applying the route-map in recursive mode, whereas the NHID is different when not in recursive mode. non recursive mode:
XXXXXXXXXX NH ID is 103 before route-map
XXXXXXXXXX NH ID is 280 after route-map
OK
recursive mode:
XXXXXXXXXX NH ID is 181818177 before route-map
XXXXXXXXXX NH ID is 181818177 after route-map
> assert match is not None, "Route %s/32 not installed with src %s" % (
route_str,
src_str,
)
E AssertionError: Route 2.2.2.1/32 not installed with src 192.168.0.1
E assert None is not None
=> without recursive mode, how come a protocol NHG can refresh the two IDs 103 and 280 => with recursive mode, how come can we allocate a new NHID which is dependent from the original one.
This issue is stale because it has been open 180 days with no activity. Comment or remove the autoclose label in order to avoid having this issue closed.
This issue will be automatically closed in the specified period unless there is further activity.