frr icon indicating copy to clipboard operation
frr copied to clipboard

lib, zebra: mark singleton nexthops inactive/active on link state changes for wecmp

Open karthikeyav opened this issue 6 months ago • 6 comments

With W-ECMP, when an interface goes down, we mark the singleton nexthop as INACTIVE. We then process the dependents (NHG groups containing this singleton nexthop) and attempt to mark them as INACTIVE as well.

During this process, we compare all singleton nexthops in the nexthop group with the singleton nexthop that went down using nexthop_same() in zebra_nhg_set_valid().

However, there's a weight mismatch issue:

  • The standalone singleton nexthop has weight = 1
  • The same singleton nexthop when part of an NHG has weight = 255 This weight mismatch causes nexthop_same() to return FALSE, preventing proper matching.

Testing:

Before fix:

ID: 76 (zebra)
     RefCnt: 20
     Uptime: 00:00:59
     VRF: default
     Valid, Installed
     Interface Index: 46
           via 22.64.0.18, swp43 (vrf default), weight 1          <<<< weight 1 for swp43
     Dependents: (74)
ID: 74 (zebra)
     RefCnt: 19
     Uptime: 00:00:59
     VRF: default
     Valid, Installed
     Depends: (75) (76) (77) (78)
           via 22.64.0.16, swp42 (vrf default), weight 255
           via 22.64.0.18, swp43 (vrf default), weight 255       <<<<< weight 255 for swp43
           via 22.64.0.20, swp44 (vrf default), weight 255
           via 22.64.0.22, swp45 (vrf default), weight 255

root@leaf:mgmt:~# ip link set swp43 down                  <<<<< trigger bring down swp43
root@leaf:mgmt:~#
root@leaf:mgmt:~# vtysh -c "show nexthop-group rib 74"
ID: 74 (zebra)
     RefCnt: 1 Time to Deletion: 00:02:54                        <<<< marked for deletion
     Uptime: 00:02:53
     VRF: default
     Valid, Installed
     Depends: (75) (76) (77) (78)
           via 22.64.0.16, swp42 (vrf default), weight 255
           via 22.64.0.18, swp43 (vrf default), weight 255        <<< swp43 not marked inactive (nexthop_same fails due to wt check)
           via 22.64.0.20, swp44 (vrf default), weight 255
           via 22.64.0.22, swp45 (vrf default), weight 255

After fix:

root@leaf:mgmt:/var/log/frr# vtysh -c "show nexthop-group rib 69"
ID: 69 (zebra)
     RefCnt: 19
     Uptime: 00:01:11
     VRF: default
     Valid, Installed
     Depends: (70) (71) (72)
           via 22.64.0.16, swp42 (vrf default), weight 255
           via 22.64.0.20, swp44 (vrf default), weight 255
           via 22.64.0.22, swp45 (vrf default), weight 255

root@leaf:mgmt:/var/log/frr# ip link set swp44 down                    <<< trigger bring swp44 down
root@leaf:mgmt:/var/log/frr#
root@leaf:mgmt:/var/log/frr# vtysh -c "show nexthop-group rib 69"      <<< NHG 69 not marked for deletion
ID: 69 (zebra)
     RefCnt: 19
     Uptime: 00:02:41
     VRF: default
     Valid, Installed
     Depends: (70) (71) (72)
           via 22.64.0.16, swp42 (vrf default), weight 255
           via 22.64.0.20, swp44 (vrf default) inactive, weight 255           <<< swp44 marked as inactive
           via 22.64.0.22, swp45 (vrf default), weight 255

Ticket: #

karthikeyav avatar Jun 03 '25 15:06 karthikeyav

@Mergifyio backport stable/10.3 stable/10.2 stable/10.1 stable/10.0

ton31337 avatar Jun 03 '25 18:06 ton31337

backport stable/10.3 stable/10.2 stable/10.1 stable/10.0

✅ Backports have been created

mergify[bot] avatar Jun 03 '25 18:06 mergify[bot]

ci:rerun

raja-rajasekar avatar Jun 03 '25 20:06 raja-rajasekar

ci:rerun

karthikeyav avatar Jun 05 '25 07:06 karthikeyav

ci:rerun

karthikeyav avatar Jun 09 '25 19:06 karthikeyav

My comments have been resolved thanks

ashred-lnx avatar Jun 10 '25 23:06 ashred-lnx

ci:rerun

karthikeyav avatar Jul 16 '25 19:07 karthikeyav

@mergifyio backport stable/10.4

Jafaral avatar Jul 24 '25 03:07 Jafaral

backport stable/10.4

✅ Backports have been created

mergify[bot] avatar Jul 24 '25 03:07 mergify[bot]