zebra: uninstall nhg when no best re point to it
Currently when upper layer protocols like ospf and static route sends the same route with different nexthops sequentially, zebra creates two different route_entry (re) and two distinct nhgs. First the static route and corresponding selected and installed, subsequently, the ospf route is installed Once zebra rib selects the ospf as best the static as non best, then the static route is not uninstalled from dplane. When networking restart happens at that point only the best NHG is installed. If user does query of nexthop-group nhe state compare prior to and after networking restart, the behavior is not symmetric. To make consistent, when zebra rib selects the best nhg to installed, adding a bit of logic to uninstall old route_entry's nhg if it is not selected as best by any of the route_entries then uninstall the nhg from kernel. This requires to keep best installed re's count in nhg struct.
Initial route points to static, route and NHG 34 is installed:
DUT# show ip route nexthop-group
S>* 20.8.8.0/30 [115/0] (34) via 20.7.7.2, swp31s0, weight 1, 00:26:31
OSFPv2 come up installs the route, route and new NHG 48 is installed:
DUT# show ip route nexthop-group
B>* 20.6.6.0/30 [20/0] (37) via 20.3.3.2, swp1s2, weight 1, 00:29:24
S 20.6.6.0/30 [75/0] (34) via 20.7.7.2, swp31s0, weight 1, 00:29:30
O>* 20.8.8.0/30 [110/8] (48) via 20.1.1.1, swp1s0, weight 1, 00:01:48
S 20.8.8.0/30 [115/0] (34) via 20.7.7.2, swp31s0, weight 1, 00:29:30
Old NHG 34 is not removed kernel:
DUT# show nexthop-group rib 34
ID: 34 (zebra)
RefCnt: 2
Uptime: 00:33:27
VRF: default
Valid, Installed <<<<< installed
Interface Index: 37
via 20.7.7.2, swp31s0 (vrf default), weight
After fix: NHG 34 is retained in zebra from unistalled from dplane DUT# show nexthop-group rib 34
ID: 34 (zebra)
RefCnt: 2
Uptime: 00:27:06
VRF: default
Valid
Interface Index: 37
via 20.7.7.2, swp31s0 (vrf default), weight 1
Signed-off-by: Chirag Shah [email protected]
hmm - I'd like to hear more about this: "Once zebra rib selects the ospf as best the static as non best, then the static route is not uninstalled from dplane."
that seems ... unexpected, I'd expect that the route "update" code was uninstalling the non-best route
hmm - I'd like to hear more about this: "Once zebra rib selects the ospf as best the static as non best, then the static route is not uninstalled from dplane."
that seems ... unexpected, I'd expect that the route "update" code was uninstalling the non-best route
@mjstapp When an existing route's re (lets say static in the example captured in commit message) becomes non-best, its associated re becomes non best but nhe refcount is greater than 0 so it won't explicitly uninstall. zebra_nhg_uninstall_kernel -> dplane_nexthop_delete (which uninstalls) . This is the code path followed but given refcount is non zero, the dplane_nexthop_delete won't be called. I have captured detail explanation in commit/PR with example.
ah, ok - it's not the "static route" that isn't uninstalled, it's the nhg(s) used by the static route.
@mjstapp When an existing route's re (lets say static in the example captured in commit message) becomes non-best, its associated re becomes non best but nhe refcount is greater than 0 so it won't explicitly uninstall. zebra_nhg_uninstall_kernel -> dplane_nexthop_delete (which uninstalls) . This is the code path followed but given refcount is non zero, the dplane_nexthop_delete won't be called. I have captured detail explanation in commit/PR with example.
ah, ok - it's not the "static route" that isn't uninstalled, it's the nhg(s) used by the static route.
@mjstapp When an existing route's re (lets say static in the example captured in commit message) becomes non-best, its associated re becomes non best but nhe refcount is greater than 0 so it won't explicitly uninstall. zebra_nhg_uninstall_kernel -> dplane_nexthop_delete (which uninstalls) . This is the code path followed but given refcount is non zero, the dplane_nexthop_delete won't be called. I have captured detail explanation in commit/PR with example.
That's Correct! Any of the upper layer protocol's re (route_entry) pointing to a particular NHG. If all the re become non-selected then uninstall NHG from kernel but retain in zebra as NHG is being reference by non selected res.
@mjstapp / @riw777 kindly please help review and let me know if further query. I have addressed all review comments and concerns.
I'm not convinced that this is something we should do. Adding yet-another refcount in this area just seems awkward. Is there some functional issue that the existing behavior leads to that needs to be fixed? Is there just some cosmetic concern?
@mjstapp there isn't any functional issue, I have tried to explained the rational using example in the commit message. There is a NHG left in dplane/kernel which is not used by any route. That is the problem we are trying to solve. The existing code does not handle it. Though one can say its a corner case but it is a bug why NHG should remained installed in kernel when Zebra decided to switch its best re to another NHG. Please take a look at the scenario outlined should be distinctly identify the issue. What other way you are thinking of solving this corner case? Kindly suggest.
Not needed.