sonic-swss
sonic-swss copied to clipboard
warm boot test case fails on dualtor active-standby setup
I am running platform_tests/test_advanced_reboot.py::test_warm_reboot test on active-standby topology and post warm boot I see incorrect routes being installed that cause issues with syncd.
Details: one of the standby port (Ethernet80 with server 192.168.0.10) that had a route via tunnel is being programmed to go via Vlan1000 (active port behavior) post warm boot. This is a conflicting information and it is being programmed via OA.
Below log shows that before warm boot 192.168.0.10 route is moved from vlan to Tunnel(active to standby):
2023-08-16.04:51:11.988930|c|SAI_OBJECT_TYPE_ROUTER_INTERFACE:oid:0x60000000009b1|SAI_ROUTER_INTERFACE_ATTR_VIRTUAL_ROUTER_ID=oid:0x3000000000042|SAI_ROUTER_INTERFACE_ATTR_SRC_MAC_ADDRESS=00:AA:BB:CC:DD:EE|SAI_ROUTER_INTERFACE_ATTR_TYPE=SAI_ROUTER_INTERFACE_TYPE_VLAN|SAI_ROUTER_INTERFACE_ATTR_VLAN_ID=oid:0x2600000000097c|SAI_ROUTER_INTERFACE_ATTR_MTU=9100
2023-08-16.04:51:50.140828|c|SAI_OBJECT_TYPE_NEXT_HOP:oid:0x4000000000a44|SAI_NEXT_HOP_ATTR_TYPE=SAI_NEXT_HOP_TYPE_TUNNEL_ENCAP|SAI_NEXT_HOP_ATTR_IP=10.1.0.32|SAI_NEXT_HOP_ATTR_TUNNEL_ID=oid:0x2a0000000009e8
2023-08-16.04:58:56.322105|c|SAI_OBJECT_TYPE_NEIGHBOR_ENTRY:{"ip":"192.168.0.10","rif":"oid:0x60000000009b1","switch_id":"oid:0x21000000000000"}|SAI_NEIGHBOR_ENTRY_ATTR_DST_MAC_ADDRESS=5A:3F:3C:87:19:14
2023-08-16.04:58:56.323497|c|SAI_OBJECT_TYPE_NEXT_HOP:oid:0x4000000000ac4|SAI_NEXT_HOP_ATTR_TYPE=SAI_NEXT_HOP_TYPE_IP|SAI_NEXT_HOP_ATTR_IP=192.168.0.10|SAI_NEXT_HOP_ATTR_ROUTER_INTERFACE_ID=oid:0x60000000009b1
2023-08-16.05:00:00.214232|r|SAI_OBJECT_TYPE_NEXT_HOP:oid:0x4000000000ac4
2023-08-16.05:00:00.215029|r|SAI_OBJECT_TYPE_NEIGHBOR_ENTRY:{"ip":"192.168.0.10","rif":"oid:0x60000000009b1","switch_id":"oid:0x21000000000000"}
2023-08-16.05:00:00.215731|c|SAI_OBJECT_TYPE_ROUTE_ENTRY:{"dest":"192.168.0.10/32","switch_id":"oid:0x21000000000000","vr":"oid:0x3000000000042"}|SAI_ROUTE_ENTRY_ATTR_PACKET_ACTION=SAI_PACKET_ACTION_FORWARD|SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000a44
Whereas post warm boot we see a Vlan route being played to SDK via SAI and looks like this is causing trouble with sdk. We immediately see crash after this route install is attempted.
2023-08-16.05:11:56.319193|c|SAI_OBJECT_TYPE_ROUTER_INTERFACE:oid:0x6000000000c8f|SAI_ROUTER_INTERFACE_ATTR_VIRTUAL_ROUTER_ID=oid:0x3000000000042|SAI_ROUTER_INTERFACE_ATTR_SRC_MAC_ADDRESS=00:AA:BB:CC:DD:EE|SAI_ROUTER_INTERFACE_ATTR_TYPE=SAI_ROUTER_INTERFACE_TYPE_VLAN|SAI_ROUTER_INTERFACE_ATTR_VLAN_ID=oid:0x26000000000c5a|SAI_ROUTER_INTERFACE_ATTR_MTU=9100
2023-08-16.05:11:56.325994|c|SAI_OBJECT_TYPE_NEIGHBOR_ENTRY:{"ip":"192.168.0.10","rif":"oid:0x6000000000c8f","switch_id":"oid:0x21000000000000"}|SAI_NEIGHBOR_ENTRY_ATTR_DST_MAC_ADDRESS=5A:3F:3C:87:19:14
2023-08-16.05:11:56.326342|c|SAI_OBJECT_TYPE_NEXT_HOP:oid:0x4000000000c98|SAI_NEXT_HOP_ATTR_TYPE=SAI_NEXT_HOP_TYPE_IP|SAI_NEXT_HOP_ATTR_IP=192.168.0.10|SAI_NEXT_HOP_ATTR_ROUTER_INTERFACE_ID=oid:0x6000000000c8f
Linkmgrd here is aware of this port being in standby mode pre and post warm-boot:
Aug 16 05:00:00.233766 m64-tor-1 INFO caclmgrd[367028]: mux cable update : '('Ethernet80', 'SET', (('state', 'standby'),))'
Aug 16 05:00:22.578248 m64-tor-1 NOTICE mux#linkmgrd: MuxManager.cpp:207 updateMuxPortConfig: Ethernet80: Mux port config: manual
Aug 16 05:00:22.578296 m64-tor-1 NOTICE mux#linkmgrd: link_manager/LinkManagerStateMachineActiveStandby.cpp:783 handleMuxConfigNotification: Ethernet80: (P: Standby, M: Standby, L: Up) -> (P: Standby, M: Standby, L: Up)
— warm boot —
Aug 16 05:11:50.039925 m64-tor-1 NOTICE mux#linkmgrd: MuxManager.cpp:207 updateMuxPortConfig: Ethernet80: Mux port config: manual
Aug 16 05:11:50.041823 m64-tor-1 NOTICE mux#linkmgrd: MuxManager.cpp:262 addOrUpdateMuxPortLinkState: Ethernet80: link state: up
Aug 16 05:11:50.042763 m64-tor-1 NOTICE mux#linkmgrd: link_manager/LinkManagerStateMachineActiveStandby.cpp:645 handleProbeMuxStateNotification: Ethernet80: Initializing MUX state 'Standby' to match xcvrd state
Even with above we see OA programming Active port route for Ethernet80 connected server.