sonic-platform-daemons icon indicating copy to clipboard operation
sonic-platform-daemons copied to clipboard

Changes for port reinitialization in case of syncd/swss/oa crash and NPU SI settings update for CMIS transceivers

Open mihirpat1 opened this issue 1 year ago • 5 comments

Description

Code changes for implementing the HLD OA crash handling to reinitialize port through xcvrd This is needed to ensure that XCVRD re-initializes all ports if syncd/swss/orchagent crashes.

Motivation and Context

In addition to the implementation mentioned in the HLD, following are the changes being done through this PR 1. Remove wait for port config completion from CMIS thread 2. Add print for module and DP states 3. Change "media settings" to NPU SI settings to clearly differentiate optics and NPU SI settings

How Has This Been Tested?

Summary Process and device restart and interface config command handling testplan

Event STATE_DB_<asic_n> cleared Xcvrd restarted NPU SI settings renotify CMIS re-init triggered Link flap
Xcvrd restart N Y N N N
Pmon restart N Y N N N
orchagent restart Y Y Y Y N/A
Swss restart Y Y Y Y N/A
Syncd restart Y Y Y Y N/A
config reload Y Y Y Y N/A
Cold reboot Y Y Y Y N/A
config interface shutdown N N N N N/A
config interface startup N N N N N/A

Transceiver OIR testing

Event STATE_DB_<asic_n> cleared Xcvrd restarted NPU SI settings notified NPU_SI_SETTINGS_SYNC_STATUS value upon event completion CMIS init triggered
Transceiver Removal N N Y NPU_SI_SETTINGS_DEFAULT N/A
Transceiver Insertion N N Y NPU_SI_SETTINGS_DONE Y

Redis-db snippet

#redis-cli -n 6 hgetall "PORT_TABLE|Ethernet8"
 1) "CMIS_REINIT_REQUIRED"
 2) "false"
 3) "NPU_SI_SETTINGS_SYNC_STATUS"
 4) "NPU_SI_SETTINGS_DEFAULT"
 5) "state"
 6) "ok"
 7) "netdev_oper_status"
 8) "down"
 9) "admin_status"
10) "up"
11) "mtu"
12) "9100"
13) "supported_speeds"
14) "40000,100000,200000,400000"
15) "supported_fecs"
16) "rs"
17) "host_tx_ready"
18) "true"
19) "speed"
20) "400000"

Testing in progress on multi-asic switch

Additional Information (Optional)

mihirpat1 avatar Nov 02 '23 00:11 mihirpat1

@prgeor @shyam77git @jaganbal-a - It will be great if you can help in reviewing this.

mihirpat1 avatar Nov 03 '23 00:11 mihirpat1

@tshalvi - It will be great if you can review this PR

mihirpat1 avatar Nov 09 '23 05:11 mihirpat1

@mihirpat1 can you check the build failure and coverage

prgeor avatar Nov 15 '23 02:11 prgeor

@mihirpat1 can you check the build failure and coverage

I have resolved it now.

mihirpat1 avatar Nov 15 '23 06:11 mihirpat1

@mihirpat1 could you resolve the conflicts?

kevinskwang avatar Dec 18 '23 12:12 kevinskwang