sonic-utilities
sonic-utilities copied to clipboard
Backup STATE_DB PORT_TABLE|Ethernet during warm-reboot
What I did
Currently, entire PORT_TABLE in STATE_DB is being deleted during warm-reboot. Due to this, host_tx_ready
changes to false after warm-reboot which causes the link to remain down.
How I did it
Backing up host_tx_ready
, NPU_SI_SETTINGS_SYNC_STATUS
and CMIS_REINIT_REQUIRED
fields from `STATE_DB PORT_TABLE* during warm-reboot now.
How to verify it
Verified that host_tx_ready in STATE_DB PORT_TABLE is retained after warm-reboot and the link remains up. Also, ensured that the keys CMIS_REINIT_REQUIRED and NPU_SI_SETTINGS_SYNC_STATUS are retained after warm-reboot. Before warm-reboot
root@sonic:/home/admin# redis-cli -n 6 hgetall "PORT_TABLE|Ethernet0"
1) "state"
2) "ok"
3) "netdev_oper_status"
4) "up"
5) "admin_status"
6) "up"
7) "mtu"
8) "9100"
9) "CMIS_REINIT_REQUIRED"
10) "false"
11) "NPU_SI_SETTINGS_SYNC_STATUS"
12) "NPU_SI_SETTINGS_DEFAULT"
13) "supported_speeds"
14) "40000,100000"
15) "supported_fecs"
16) "none,rs"
17) "host_tx_ready"
18) "true"
19) "speed"
20) "100000"
21) "fec"
22) "N/A"
root@sonic:/home/admin#
After warm-reboot script backs up PORT_TABLE and deletes unwanted fields
root@sonic:/home/admin# redis-cli -n 6 hgetall "PORT_TABLE|Ethernet0"
1) "CMIS_REINIT_REQUIRED"
2) "false"
3) "NPU_SI_SETTINGS_SYNC_STATUS"
4) "NPU_SI_SETTINGS_DEFAULT"
5) "host_tx_ready"
6) "true"
root@sonic:/home/admin#
After switch boot-up post warm-reboot
root@sonic:/home/admin# redis-cli -n 6 hgetall "PORT_TABLE|Ethernet0"
1) "state"
2) "ok"
3) "netdev_oper_status"
4) "up"
5) "admin_status"
6) "up"
7) "mtu"
8) "9100"
9) "supported_speeds"
10) "40000,100000"
11) "supported_fecs"
12) "none,rs"
13) "CMIS_REINIT_REQUIRED"
14) "false"
15) "NPU_SI_SETTINGS_SYNC_STATUS"
16) "NPU_SI_SETTINGS_DEFAULT"
17) "host_tx_ready"
18) "true"
19) "speed"
20) "100000"
21) "fec"
22) "N/A"
root@sonic:/home/admin#
Previous command output (if the output of a command-line utility has changed)
New command output (if the output of a command-line utility has changed)
Just curious, what was the reason to not back up the entire table? Is it because some of the fields (e.g. netdev_oper_status
) should be re-populated after warm-reboot?
Just curious, what was the reason to not back up the entire table? Is it because some of the fields (e.g.
netdev_oper_status
) should be re-populated after warm-reboot?
@longhuan-cisco - Yes, you are correct. Hence, we decided to preserve selected fields which xcvrd/OA cares about and delete other fields from STATE_DB.
As discussed, I tested the change from this PR, host_tx_ready
gets retained properly after warm-reboot and link stays up (especially for those CMIS modules).
@mihirpat1 @prgeor Could you please continue on this PR for the remaining?
root@t0-dut:/home/cisco# show reboot-cause history
Name Cause Time User Comment
------------------- ----------- ------------------------------- ------ ---------
2024_05_22_07_52_46 warm-reboot Wed May 22 07:49:46 UTC 2024 cisco N/A
...
May 22 07:55:26.154419 cmono-t0-dut NOTICE pmon#xcvrd[27]: XCVRD INIT: Wait for port config is done
May 22 07:55:26.156638 cmono-t0-dut NOTICE pmon#xcvrd[27]: XCVRD INIT: After port config is done
May 22 07:55:26.183632 cmono-t0-dut NOTICE pmon#xcvrd[27]: Start daemon main loop with thread count 3
May 22 07:55:26.183632 cmono-t0-dut NOTICE pmon#xcvrd[27]: Started thread CmisManagerTask
May 22 07:55:26.183675 cmono-t0-dut NOTICE pmon#xcvrd[27]: Started thread DomInfoUpdateTask
May 22 07:55:26.183675 cmono-t0-dut NOTICE pmon#xcvrd[27]: Started thread SfpStateUpdateTask
...
May 22 07:55:26.198509 cmono-t0-dut WARNING pmon#xcvrd[27]: $$$ Ethernet32 handle_port_update_event() : op=SET DB:STATE_DB Table:PORT_TABLE fvp {'host_tx_ready': 'true', 'state': 'ok', 'netdev_oper_status': 'up', 'admin_status': 'up', 'mtu': '9100', 'supported_speeds': '200000,400000', 'supported_fecs': 'rs', 'speed': '400000'}
May 22 07:55:26.198532 cmono-t0-dut WARNING pmon#xcvrd[27]: $$$ Ethernet56 handle_port_update_event() : op=SET DB:STATE_DB Table:PORT_TABLE fvp {'host_tx_ready': 'true', 'state': 'ok', 'netdev_oper_status': 'up', 'admin_status': 'up', 'mtu': '9100', 'supported_speeds': '200000,400000', 'supported_fecs': 'rs', 'speed': '400000'}
May 22 07:55:26.198554 cmono-t0-dut WARNING pmon#xcvrd[27]: $$$ Ethernet0 handle_port_update_event() : op=SET DB:STATE_DB Table:PORT_TABLE fvp {'host_tx_ready': 'false', 'state': 'ok', 'netdev_oper_status': 'down', 'admin_status': 'down', 'mtu': '9100', 'supported_speeds': '200000,400000', 'supported_fecs': 'rs'}
May 22 07:55:26.198577 cmono-t0-dut WARNING pmon#xcvrd[27]: $$$ Ethernet16 handle_port_update_event() : op=SET DB:STATE_DB Table:PORT_TABLE fvp {'host_tx_ready': 'true', 'state': 'ok', 'netdev_oper_status': 'up', 'admin_status': 'up', 'mtu': '9100', 'supported_speeds': '200000,400000', 'supported_fecs': 'rs', 'speed': '400000'}
May 22 07:55:26.198601 cmono-t0-dut WARNING pmon#xcvrd[27]: $$$ Ethernet128 handle_port_update_event() : op=SET DB:STATE_DB Table:PORT_TABLE fvp {'host_tx_ready': 'true', 'state': 'ok', 'netdev_oper_status': 'up', 'admin_status': 'up', 'mtu': '9100', 'supported_speeds': '200000,400000', 'supported_fecs': 'rs', 'speed': '400000'}
May 22 07:55:26.198618 cmono-t0-dut WARNING pmon#xcvrd[27]: $$$ Ethernet72 handle_port_update_event() : op=SET DB:STATE_DB Table:PORT_TABLE fvp {'host_tx_ready': 'true', 'state': 'ok', 'netdev_oper_status': 'up', 'admin_status': 'up', 'mtu': '9100', 'supported_speeds': '200000,400000', 'supported_fecs': 'rs', 'speed': '400000'}
May 22 07:55:26.198643 cmono-t0-dut WARNING pmon#xcvrd[27]: $$$ Ethernet120 handle_port_update_event() : op=SET DB:STATE_DB Table:PORT_TABLE fvp {'host_tx_ready': 'true', 'state': 'ok', 'netdev_oper_status': 'up', 'admin_status': 'up', 'mtu': '9100', 'supported_speeds': '200000,400000', 'supported_fecs': 'rs', 'speed': '400000'}
May 22 07:55:26.198661 cmono-t0-dut WARNING pmon#xcvrd[27]: $$$ Ethernet192 handle_port_update_event() : op=SET DB:STATE_DB Table:PORT_TABLE fvp {'host_tx_ready': 'true', 'state': 'ok', 'netdev_oper_status': 'up', 'admin_status': 'up', 'mtu': '9100', 'supported_speeds': '200000,400000', 'supported_fecs': 'rs', 'speed': '400000'}
May 22 07:55:26.198689 cmono-t0-dut WARNING pmon#xcvrd[27]: $$$ Ethernet200 handle_port_update_event() : op=SET DB:STATE_DB Table:PORT_TABLE fvp {'host_tx_ready': 'false', 'state': 'ok', 'netdev_oper_status': 'down', 'admin_status': 'down', 'mtu': '9100', 'supported_speeds': '200000,400000', 'supported_fecs': 'rs'}
May 22 07:55:26.198712 cmono-t0-dut WARNING pmon#xcvrd[27]: $$$ Ethernet176 handle_port_update_event() : op=SET DB:STATE_DB Table:PORT_TABLE fvp {'host_tx_ready': 'true', 'state': 'ok', 'netdev_oper_status': 'up', 'admin_status': 'up', 'mtu': '9100', 'supported_speeds': '200000,400000', 'supported_fecs': 'rs', 'speed': '400000'}
...
@StormLiangMS @yxieca @bingwang-ms please cherry pick this to 202311. Need for warm reboot support for platforms using CMIS optics
Cherry-pick PR to 202311: https://github.com/sonic-net/sonic-utilities/pull/3352
@bingwang-ms we need this in 202405
@prgeor Seems there is cherry-pick conflict. Please double check
@prgeor Seems there is cherry-pick conflict. Please double check
@bingwang-ms I have removed the 202405 tags since this is already part of 202405.