sonic-platform-daemons
sonic-platform-daemons copied to clipboard
Changes for port reinitialization in case of syncd/swss/oa crash and NPU SI settings update for CMIS transceivers
Description
Code changes for implementing the HLD OA crash handling to reinitialize port through xcvrd This is needed to ensure that XCVRD re-initializes all ports if syncd/swss/orchagent crashes.
Motivation and Context
In addition to the implementation mentioned in the HLD, following are the changes being done through this PR 1. Remove wait for port config completion from CMIS thread 2. Add print for module and DP states 3. Change "media settings" to NPU SI settings to clearly differentiate optics and NPU SI settings
How Has This Been Tested?
Summary Process and device restart and interface config command handling testplan
Event | STATE_DB_<asic_n> cleared | Xcvrd restarted | NPU SI settings renotify | CMIS re-init triggered | Link flap |
---|---|---|---|---|---|
Xcvrd restart | N | Y | N | N | N |
Pmon restart | N | Y | N | N | N |
orchagent restart | Y | Y | Y | Y | N/A |
Swss restart | Y | Y | Y | Y | N/A |
Syncd restart | Y | Y | Y | Y | N/A |
config reload | Y | Y | Y | Y | N/A |
Cold reboot | Y | Y | Y | Y | N/A |
config interface shutdown | N | N | N | N | N/A |
config interface startup | N | N | N | N | N/A |
Transceiver OIR testing
Event | STATE_DB_<asic_n> cleared | Xcvrd restarted | NPU SI settings notified | NPU_SI_SETTINGS_SYNC_STATUS value upon event completion | CMIS init triggered |
---|---|---|---|---|---|
Transceiver Removal | N | N | Y | NPU_SI_SETTINGS_DEFAULT | N/A |
Transceiver Insertion | N | N | Y | NPU_SI_SETTINGS_DONE | Y |
Redis-db snippet
#redis-cli -n 6 hgetall "PORT_TABLE|Ethernet8"
1) "CMIS_REINIT_REQUIRED"
2) "false"
3) "NPU_SI_SETTINGS_SYNC_STATUS"
4) "NPU_SI_SETTINGS_DEFAULT"
5) "state"
6) "ok"
7) "netdev_oper_status"
8) "down"
9) "admin_status"
10) "up"
11) "mtu"
12) "9100"
13) "supported_speeds"
14) "40000,100000,200000,400000"
15) "supported_fecs"
16) "rs"
17) "host_tx_ready"
18) "true"
19) "speed"
20) "400000"
Testing in progress on multi-asic switch
Additional Information (Optional)
@prgeor @shyam77git @jaganbal-a - It will be great if you can help in reviewing this.
@tshalvi - It will be great if you can review this PR
@mihirpat1 can you check the build failure and coverage
@mihirpat1 can you check the build failure and coverage
I have resolved it now.
@mihirpat1 could you resolve the conflicts?