frr icon indicating copy to clipboard operation
frr copied to clipboard

NHG : proto NHG modification or Zebra NHG modification issues with protocol NHG routes

Open pguibert6WIND opened this issue 1 year ago • 2 comments

Describe the bug

This is a three-step scenario, where multiple problems are identified. step1 : I create a nexthop-group interface with its route in sharpd, The expectation is that the created protocol NHG is created with the route. step 2 : if I add a route-map in zebra to change the src attribute, the route is not refreshed. The expectation is that a new NHG is duped from the protocol NHG with a nexthop that contains the src attribute This step fails today. step 3 : If I modify the original nexthop group, if a new NHG had been allocated in the previous step, then the modification fails to update the changes from the protocol level.

To Reproduce step 1:

interface loop1
ip address 192.168.0.1/24
exit
interface loop2
ip address 192.168.2.1/26
exit
ip route 1.1.1.1/32 loop1
debu zebra events
debu zebra kernel msgdump send
log stdout debugging 
nexthop-group test
nexthop 1.1.1.1 loop1 onlink
nexthop 1.1.1.2 loop2 onlink
exit
exit
sharp install routes 5.5.5.5 nexthop-group test 1

ubuntu2204# show ip route 5.5.5.5 nexthop-group 
Routing entry for 5.5.5.5/32
  Known via "sharp", distance 150, metric 0, best
  Last update 00:00:21 ago
  Nexthop Group ID: 181818168
  * 1.1.1.1, via loop1 onlink, weight 1
  * 1.1.1.2, via loop2 onlink, weight 1
linux# ip ro show
5.5.5.5 nhid 181818168 proto 194 metric 20 
	nexthop via 1.1.1.1 dev loop1 weight 1 onlink 
	nexthop via 1.1.1.2 dev loop2 weight 1 onlink 

step 2:

route-map NH-SRC permit 111
set src 192.168.0.1
exit
ip protocol sharp route-map NH-SRC

ubuntu2204# show ip route 5.5.5.5 nexthop-group 
Routing entry for 5.5.5.5/32
  Known via "sharp", distance 150, metric 0, best
  Last update 00:01:44 ago
  Nexthop Group ID: 181818168
  * 1.1.1.1, via loop1 onlink, weight 1
  * 1.1.1.2, via loop2 onlink, weight 1

linux# ip ro show
5.5.5.5 nhid 181818168 proto 194 metric 20 
	nexthop via 1.1.1.1 dev loop1 weight 1 onlink 
	nexthop via 1.1.1.2 dev loop2 weight 1 onlink 

step 3:

nexthop-group test
no nexthop 1.1.1.1 loop1 onlink

Expected behavior A new NHG id is allocated in step 2

route-map NH-SRC permit 111
set src 192.168.0.1
exit
ip protocol sharp route-map NH-SRC

ubuntu2204# show ip route 5.5.5.5 nexthop-group 
Routing entry for 5.5.5.5/32
  Known via "sharp", distance 150, metric 0, best
  Last update 00:01:44 ago
  Nexthop Group ID: 532
  * 1.1.1.1, via loop1 onlink, weight 1
  * 1.1.1.2, via loop2 onlink, weight 1

linux# ip ro show
5.5.5.5 nhid 532 proto 194 src 192.168.0.1 metric 20 
	nexthop via 1.1.1.1 dev loop1 weight 1 onlink 
	nexthop via 1.1.1.2 dev loop2 weight 1 onlink 

The change is propagated in the new NHG in step 2. A new NHG is allocated for step 3:

ubuntu2204# show ip route 5.5.5.5 nexthop-group 
Routing entry for 5.5.5.5/32
  Known via "sharp", distance 150, metric 0, best
  Last update 00:01:44 ago
  Nexthop Group ID: 533
  * 1.1.1.2, via loop2 onlink, weight 1

linux# ip ro show
5.5.5.5 nhid 533 proto 194 src 192.168.0.1 metric 20 
	nexthop via 1.1.1.2 dev loop2 weight 1 onlink 

Screenshots

Versions recent version ubuntu-22-04. Additional context

pguibert6WIND avatar Jan 18 '24 10:01 pguibert6WIND

The issue is a limitation related protocol NHGs. Meaning that when bgp PIC will be made available, it is highly likely that the 'ip protocol bgp route-map' will have some severe flaws in relation with bgp nexthops updates.

An attempt has been tried to gather all NHGs that depend from the same NHG protocol in that branch:

https://github.com/pguibert6WIND/frr/tree/nhg_dependency_issue_15169

Unfortunately, there are some side effect with all_protocol_startup... So set it to on hold.

pguibert6WIND avatar Jan 21 '24 20:01 pguibert6WIND

with pull request, the route-map test fails:

when applying follow patch over https://github.com/FRRouting/frr/pull/14973

@@ -1453,10 +1453,16 @@ def test_nexthop_groups_with_route_maps():
     net["r1"].cmd(
         'vtysh -c "c t" -c "route-map NH-SRC permit 111" -c "set src %s"' % src_str
     )
-    net["r1"].cmd('vtysh -c "c t" -c "ip protocol sharp route-map NH-SRC"')
-
     net["r1"].cmd('vtysh -c "sharp install routes %s nexthop-group test 1"' % route_str)
 
+    nhg_id = route_get_nhg_id("%s/32" % route_str)
+    print(f"XXXXXXXXXX NH ID is {nhg_id} before route-map")
+    verify_route_nexthop_group("%s/32" % route_str)
+
+    net["r1"].cmd('vtysh -c "c t" -c "ip protocol sharp route-map NH-SRC"')
+
+    nhg_id = route_get_nhg_id("%s/32" % route_str)
+    print(f"XXXXXXXXXX NH ID is {nhg_id} after route-map")
     verify_route_nexthop_group("%s/32" % route_str)
 
     # Only a valid test on linux using nexthop objects
-- 
2.34.1

I obtain the same NHID before and after applying the route-map in recursive mode, whereas the NHID is different when not in recursive mode. non recursive mode:

XXXXXXXXXX NH ID is 103 before route-map
XXXXXXXXXX NH ID is 280 after route-map
OK

recursive mode:

XXXXXXXXXX NH ID is 181818177 before route-map
XXXXXXXXXX NH ID is 181818177 after route-map
>           assert match is not None, "Route %s/32 not installed with src %s" % (
                route_str,
                src_str,
            )
E           AssertionError: Route 2.2.2.1/32 not installed with src 192.168.0.1
E           assert None is not None

=> without recursive mode, how come a protocol NHG can refresh the two IDs 103 and 280 => with recursive mode, how come can we allocate a new NHID which is dependent from the original one.

pguibert6WIND avatar Jan 23 '24 20:01 pguibert6WIND

This issue is stale because it has been open 180 days with no activity. Comment or remove the autoclose label in order to avoid having this issue closed.

github-actions[bot] avatar Jul 22 '24 01:07 github-actions[bot]

This issue will be automatically closed in the specified period unless there is further activity.

frrbot[bot] avatar Jul 22 '24 01:07 frrbot[bot]