frr
frr copied to clipboard
lib, zebra: mark singleton nexthops inactive/active on link state changes for wecmp
With W-ECMP, when an interface goes down, we mark the singleton nexthop as INACTIVE. We then process the dependents (NHG groups containing this singleton nexthop) and attempt to mark them as INACTIVE as well.
During this process, we compare all singleton nexthops in the nexthop group with the singleton nexthop that went down using nexthop_same() in zebra_nhg_set_valid().
However, there's a weight mismatch issue:
- The standalone singleton nexthop has weight = 1
- The same singleton nexthop when part of an NHG has weight = 255 This weight mismatch causes nexthop_same() to return FALSE, preventing proper matching.
Testing:
Before fix:
ID: 76 (zebra)
RefCnt: 20
Uptime: 00:00:59
VRF: default
Valid, Installed
Interface Index: 46
via 22.64.0.18, swp43 (vrf default), weight 1 <<<< weight 1 for swp43
Dependents: (74)
ID: 74 (zebra)
RefCnt: 19
Uptime: 00:00:59
VRF: default
Valid, Installed
Depends: (75) (76) (77) (78)
via 22.64.0.16, swp42 (vrf default), weight 255
via 22.64.0.18, swp43 (vrf default), weight 255 <<<<< weight 255 for swp43
via 22.64.0.20, swp44 (vrf default), weight 255
via 22.64.0.22, swp45 (vrf default), weight 255
root@leaf:mgmt:~# ip link set swp43 down <<<<< trigger bring down swp43
root@leaf:mgmt:~#
root@leaf:mgmt:~# vtysh -c "show nexthop-group rib 74"
ID: 74 (zebra)
RefCnt: 1 Time to Deletion: 00:02:54 <<<< marked for deletion
Uptime: 00:02:53
VRF: default
Valid, Installed
Depends: (75) (76) (77) (78)
via 22.64.0.16, swp42 (vrf default), weight 255
via 22.64.0.18, swp43 (vrf default), weight 255 <<< swp43 not marked inactive (nexthop_same fails due to wt check)
via 22.64.0.20, swp44 (vrf default), weight 255
via 22.64.0.22, swp45 (vrf default), weight 255
After fix:
root@leaf:mgmt:/var/log/frr# vtysh -c "show nexthop-group rib 69"
ID: 69 (zebra)
RefCnt: 19
Uptime: 00:01:11
VRF: default
Valid, Installed
Depends: (70) (71) (72)
via 22.64.0.16, swp42 (vrf default), weight 255
via 22.64.0.20, swp44 (vrf default), weight 255
via 22.64.0.22, swp45 (vrf default), weight 255
root@leaf:mgmt:/var/log/frr# ip link set swp44 down <<< trigger bring swp44 down
root@leaf:mgmt:/var/log/frr#
root@leaf:mgmt:/var/log/frr# vtysh -c "show nexthop-group rib 69" <<< NHG 69 not marked for deletion
ID: 69 (zebra)
RefCnt: 19
Uptime: 00:02:41
VRF: default
Valid, Installed
Depends: (70) (71) (72)
via 22.64.0.16, swp42 (vrf default), weight 255
via 22.64.0.20, swp44 (vrf default) inactive, weight 255 <<< swp44 marked as inactive
via 22.64.0.22, swp45 (vrf default), weight 255
Ticket: #
@Mergifyio backport stable/10.3 stable/10.2 stable/10.1 stable/10.0
backport stable/10.3 stable/10.2 stable/10.1 stable/10.0
✅ Backports have been created
- #19254 lib, zebra: mark singleton nexthops inactive/active on link state changes for wecmp (backport #18947) has been created for branch
stable/10.3 - #19255 lib, zebra: mark singleton nexthops inactive/active on link state changes for wecmp (backport #18947) has been created for branch
stable/10.2 - #19256 lib, zebra: mark singleton nexthops inactive/active on link state changes for wecmp (backport #18947) has been created for branch
stable/10.1but encountered conflicts - #19257 lib, zebra: mark singleton nexthops inactive/active on link state changes for wecmp (backport #18947) has been created for branch
stable/10.0but encountered conflicts
ci:rerun
ci:rerun
ci:rerun
My comments have been resolved thanks
ci:rerun
@mergifyio backport stable/10.4
backport stable/10.4
✅ Backports have been created
- #19258 lib, zebra: mark singleton nexthops inactive/active on link state changes for wecmp (backport #18947) has been created for branch
stable/10.4