sonic-mgmt
sonic-mgmt copied to clipboard
Add extra logic in test_sfputil to make sure modules are restored
Description of PR
Summary: Fixes # (issue)
Add extra logic (common for both CMIS and non-CMIS) in test_sfputil to make sure modules are restored fully after sfp reset, this is especially helpful for restoring CMIS modules which requires config/CMIS_state_machine to be replayed after sfp reset.
Type of change
- [ ] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] Test case(new/improvement)
Back port request
- [ ] 201911
- [ ] 202012
- [ ] 202205
- [ ] 202305
- [ ] 202311
Approach
What is the motivation for this PR?
How did you do it?
How did you verify/test it?
Verified the testcase passed on setup which has 400G modules, and verified syslog the config/CMIS state machine was replayed properly after reset, modules were fully restored.
root@cmono-t0-dut:/home/cisco# show int trans info Ethernet8
Ethernet8: SFP EEPROM detected
...
Application Advertisement: 400GAUI-8 C2M (Annex 120E) - Host Assign (0x1) - 400GBASE-DR4 (Cl 124) - Media Assign (0x1)
100GAUI-2 C2M (Annex 135G) - Host Assign (0x55) - 100GBASE-DR (Cl 140) - Media Assign (0xf)
200GAUI-4 C2M (Annex 120E) - Host Assign (0x11) - Undefined - Media Assign (0x5)
CMIS Rev: 4.0
Connector: MPO 1x12
Encoding: N/A
Extended Identifier: Power Class 4 (8.0W Max)
Extended RateSelect Compliance: N/A
Host Lane Count: 8
Identifier: QSFP-DD Double Density 8X Pluggable Transceiver
...
platform_tests/sfp/test_sfputil.py::test_check_sfputil_reset[cmono-t0-dut-None] PASSED [100%]
====================================================== warnings summary =======================================================
../../usr/local/lib/python3.8/dist-packages/paramiko/transport.py:236
/usr/local/lib/python3.8/dist-packages/paramiko/transport.py:236: CryptographyDeprecationWarning: Blowfish has been deprecated
"class": algorithms.Blowfish,
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
------------------------------- generated xml file: /data/tests/logs/tr_2024-06-03-18-13-32.xml -------------------------------
========================================== 1 passed, 1 warning in 1434.78s (0:23:54) ==========================================
INFO:root:Can not get Allure report URL. Please check logs
Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation
The pre-commit check detected issues in the files touched by this pull request. The pre-commit check is a mandatory check, please fix detected issues.
Detailed pre-commit check results:
trim trailing whitespace.................................................Passed
fix end of files.........................................................Passed
check yaml...........................................(no files to check)Skipped
check for added large files..............................................Passed
check python ast.........................................................Passed
flake8...................................................................Failed
- hook id: flake8
- exit code: 1
tests/platform_tests/sfp/test_sfputil.py:208:13: E125 continuation line with same indent as next logical line
flake8...............................................(no files to check)Skipped
check conditional mark sort..........................(no files to check)Skipped
To run the pre-commit checks locally, you can follow below steps:
- Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt docker container.
- Ensure that the
pre-commitpackage is installed:
sudo pip install pre-commit
- Go to repository root folder
- Install the pre-commit hooks:
pre-commit install
- Use pre-commit to check staged file:
pre-commit
- Alternatively, you can check committed files using:
pre-commit run --from-ref <commit_id> --to-ref <commit_id>
The pre-commit check detected issues in the files touched by this pull request. The pre-commit check is a mandatory check, please fix detected issues.
Detailed pre-commit check results:
trim trailing whitespace.................................................Passed
fix end of files.........................................................Passed
check yaml...........................................(no files to check)Skipped
check for added large files..............................................Passed
check python ast.........................................................Passed
flake8...................................................................Failed
- hook id: flake8
- exit code: 1
tests/platform_tests/sfp/test_sfputil.py:18:1: F811 redefinition of unused 'wait_until' from line 16
flake8...............................................(no files to check)Skipped
check conditional mark sort..........................(no files to check)Skipped
To run the pre-commit checks locally, you can follow below steps:
- Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt docker container.
- Ensure that the
pre-commitpackage is installed:
sudo pip install pre-commit
- Go to repository root folder
- Install the pre-commit hooks:
pre-commit install
- Use pre-commit to check staged file:
pre-commit
- Alternatively, you can check committed files using:
pre-commit run --from-ref <commit_id> --to-ref <commit_id>
Hi @mihirpat1 , there are some questions left in the PR, can you have a look?
Hi @mihirpat1 , there are some questions left in the PR, can you have a look?
@wsycqyz Can you please share the open questions for me?
The below two pre-commit check failures are due to baseline, because they started to happen right after merging master to local branch https://github.com/sonic-net/sonic-mgmt/pull/13108/commits/b0dd014f6bba79df0a9384e703499b1907b4656b: https://github.com/sonic-net/sonic-mgmt/pull/13108/checks?check_run_id=27298781282 https://github.com/sonic-net/sonic-mgmt/pull/13108/checks?check_run_id=27298780950
@kevinwangsk please help with review / approval. Thanks!
@longhuan-cisco Can you please test the changes on a device with breakout cable and ensure all subports are linked up after one of the subport has gone through reset. Also, please ensure that the DOM config is disabled only for the first subport of a breakout group.
@mihirpat1 Updated code to only disable DOM for first subport of breakout group, and tested it.
The pre-commit sanity check failure Azure.sonic-mgmt (Test onboarding t1 testcases by Elastictest - optional) seems to be some baseline issue, also saw it on other PRs': https://github.com/sonic-net/sonic-mgmt/pull/14474/checks?check_run_id=29863617790 https://github.com/sonic-net/sonic-mgmt/pull/14472/checks?check_run_id=29853135820
@prgeor for viz @longhuan-cisco I am seeing that after executing the
test_check_sfputil_resettestcase individually, a config reload is being triggered due to a failure seen while executingcore_dump_and_config_check. Are you observing a similar behavior too?Based on my understanding, this config reload is being triggered due to config_db_check_pass being set to False. This is because the current running_config has
"dom_polling":"enabled"key-value introduced to PORT table of CONFIG_DB since the testcase disables and enables DOM. By default, transceiver DOM polling is enabled but thedom_pollingfield may not be present in PORT table of CONFIG_DB.
@mihirpat1 Yes, I observed the same: testcase passed, but there's a WARNING on config_check due to that dom_polling=enabled added after restoring the port, and config reload is triggered as a result.
Hence, can we ensure that the config reload is not triggered if
- pre-running config does not have
dom_pollingfield present- AND the
dom_pollingfield is set toenabledin running-config post testcase execution
To avoid the config reload:
Were you suggesting to change the logic in conftest.py:core_dump_and_config_check() to exclude checking dom_polling field, if there's no dom_polling present in port config pre-test AND there's dom_polling=enabled in port config post-test?
yea, this would work.
Alternatively, simpler approach might be (although a little bit "raw"): we may clear this filed by sonic-db-cli CONFIG_DB HDEL 'PORT|Ethernet0' 'dom_polling' in the end for those relevant interfaces.
what do you think?
@prgeor for viz @longhuan-cisco I am seeing that after executing the
test_check_sfputil_resettestcase individually, a config reload is being triggered due to a failure seen while executingcore_dump_and_config_check. Are you observing a similar behavior too? Based on my understanding, this config reload is being triggered due to config_db_check_pass being set to False. This is because the current running_config has"dom_polling":"enabled"key-value introduced to PORT table of CONFIG_DB since the testcase disables and enables DOM. By default, transceiver DOM polling is enabled but thedom_pollingfield may not be present in PORT table of CONFIG_DB.@mihirpat1 Yes, I observed the same: testcase passed, but there's a WARNING on config_check due to that
dom_polling=enabledadded after restoring the port, and config reload is triggered as a result.Hence, can we ensure that the config reload is not triggered if
- pre-running config does not have
dom_pollingfield present- AND the
dom_pollingfield is set toenabledin running-config post testcase executionTo avoid the config reload: Were you suggesting to change the logic in conftest.py:core_dump_and_config_check() to exclude checking
dom_pollingfield, if there's nodom_pollingpresent in port config pre-test AND there'sdom_polling=enabledin port config post-test? yea, this would work.Alternatively, simpler approach might be (although a little bit "raw"): we may clear this filed by
sonic-db-cli CONFIG_DB HDEL 'PORT|Ethernet0' 'dom_polling'in the end for those relevant interfaces. what do you think?
@longhuan-cisco I think fixing in this section will be more gene
@prgeor for viz @longhuan-cisco I am seeing that after executing the
test_check_sfputil_resettestcase individually, a config reload is being triggered due to a failure seen while executingcore_dump_and_config_check. Are you observing a similar behavior too? Based on my understanding, this config reload is being triggered due to config_db_check_pass being set to False. This is because the current running_config has"dom_polling":"enabled"key-value introduced to PORT table of CONFIG_DB since the testcase disables and enables DOM. By default, transceiver DOM polling is enabled but thedom_pollingfield may not be present in PORT table of CONFIG_DB.@mihirpat1 Yes, I observed the same: testcase passed, but there's a WARNING on config_check due to that
dom_polling=enabledadded after restoring the port, and config reload is triggered as a result.Hence, can we ensure that the config reload is not triggered if
- pre-running config does not have
dom_pollingfield present- AND the
dom_pollingfield is set toenabledin running-config post testcase executionTo avoid the config reload: Were you suggesting to change the logic in conftest.py:core_dump_and_config_check() to exclude checking
dom_pollingfield, if there's nodom_pollingpresent in port config pre-test AND there'sdom_polling=enabledin port config post-test? yea, this would work.Alternatively, simpler approach might be (although a little bit "raw"): we may clear this filed by
sonic-db-cli CONFIG_DB HDEL 'PORT|Ethernet0' 'dom_polling'in the end for those relevant interfaces. what do you think?
@longhuan-cisco We can proceed with the alternative aproach (clearing the dom_polling field after enabling DOM)
@mihirpat1 can you review
@mihirpat1 can you review
@longhuan-cisco Can you please help in addressing the below per our discussion? https://github.com/sonic-net/sonic-mgmt/pull/13108#discussion_r1785387333
@longhuan-cisco Can you please help in addressing the below per our discussion? #13108 (comment)
@mihirpat1 I addressed as what we discussed.
And I also added logic to skip config interface shudown/startup for already admin-down ports, because of two reasons: 1) no need to shutdown already admin-down ports; 2) to avoid unexpectedly startup pre-test admin-down port, which can cause config check failed. This skip is especially needed, on certain test topologies/configurations, where some of the ports in admin-down state always:
Interface Lanes Speed MTU FEC Alias Vlan Oper Admin Type Asym PFC
-------------- --------------------------------------- ------- ----- ----- ------- -------------- ------ ------- ----------------------------------------------- ----------
Ethernet0 2304,2305,2306,2307,2308,2309,2310,2311 400G 9100 N/A etp0 routed down down QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet8 2320,2321,2322,2323,2324,2325,2326,2327 400G 9100 N/A etp1 trunk up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet16 2312,2313,2314,2315,2316,2317,2318,2319 400G 9100 N/A etp2 trunk up up QSFP-DD Double Density 8X Pluggable Transceiver off
...
Ethernet184 0,1,2,3,4,5,6,7 400G 9100 N/A etp23 trunk up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet192 256,257,258,259,260,261,262,263 400G 9100 N/A etp24 trunk up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet200 8,9,10,11,12,13,14,15 400G 9100 N/A etp25 routed down down QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet208 1024,1025,1026,1027,1028,1029,1030,1031 400G 9100 N/A etp26 routed down down QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet216 768,769,770,771,772,773,774,775 400G 9100 N/A etp27 routed down down QSFP-DD Double Density 8X Pluggable Transceiver off
...
@prgeor, Mihir has reviewed the PR. Can u pls merge this.
@Junchao-Mellanox could you resolve the conversation if your comments are addressed?
@longhuan-cisco PR conflicts with 202405 branch
@longhuan-cisco pls address the conflicts on 202405.
@kevinskwang I've addressed conflicts and raised cherry-pick PR already https://github.com/sonic-net/sonic-mgmt/pull/15543
@longhuan-cisco Hi, we need this changes also for 202311 branch, can you please also make PR for 202311 ?
@longhuan-cisco Hi, we need this changes also for 202311 branch, can you please also make PR for 202311 ?
@eyakubch Could you please check with them if it's OK to add Request for 202311 branch tag? Normally that tag is needed for raising cherry-pick.
@longhuan-cisco whom I need to ask ?