sonic-mgmt
sonic-mgmt copied to clipboard
Fix qos/test_pfc_pause.py for dualtor.
Description of PR
Summary: Fixed qos/test_pfc_pause.py for dualtor. Fixes https://github.com/aristanetworks/sonic-qual.msft/issues/86
Type of change
- [ ] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [x] Test case(new/improvement)
Back port request
- [ ] 201911
- [ ] 202012
- [ ] 202205
- [x] 202305
- [x] 202311
Approach
What is the motivation for this PR?
qos/test_pfc_pause.py fails for dualtor with the following errors
IndexError: list index out of range
Failed: Dscp: [3], Background Dscp: 4, errors occured: Ethernet8:[0, 20] Ethernet12:[0, 20]
How did you do it?
Issue 1 is addressed by skipping the test for PG 2 and 6 as none of the the DSCP values map to these PGs in case of dualtor downlink ports. ( Addressed by #12229 ) Issue 2 is addressed by -
- Making the randomly selected ToR as active using the fixture
toggle_all_simulator_ports_to_rand_selected_tor_m( addressed by #12229 ) - Use downlink interface to retrieve lossless priorities as well as DSCP_TO_TC_MAP rather than checking a random interface because qos config is different for uplink and downlink ports in case of dualtor and in this test the traffic flow is from server to server. ( addressed by this PR )
How did you verify/test it?
Verified on Arista-7260 platform using dualtor topology.
Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation
The pre-commit check detected issues in the files touched by this pull request. The pre-commit check is a mandatory check, please fix detected issues.
Detailed pre-commit check results:
trim trailing whitespace.................................................Passed
fix end of files.........................................................Passed
check yaml...........................................(no files to check)Skipped
check for added large files..............................................Passed
check python ast.........................................................Passed
flake8...................................................................Failed
- hook id: flake8
- exit code: 1
tests/qos/test_pfc_pause.py:11:1: F401 'tests.common.dualtor.mux_simulator_control.toggle_all_simulator_ports_to_rand_selected_tor_m' imported but unused
tests/qos/test_pfc_pause.py:40:70: F811 redefinition of unused 'toggle_all_simulator_ports_to_rand_selected_tor_m' from line 11
flake8...............................................(no files to check)Skipped
check conditional mark sort..........................(no files to check)Skipped
To run the pre-commit checks locally, you can follow below steps:
- Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt docker container.
- Ensure that the
pre-commitpackage is installed:
sudo pip install pre-commit
- Go to repository root folder
- Install the pre-commit hooks:
pre-commit install
- Use pre-commit to check staged file:
pre-commit
- Alternatively, you can check committed files using:
pre-commit run --from-ref <commit_id> --to-ref <commit_id>
hi @vivekverma-arista , could you check if below RP fix same issue? https://github.com/sonic-net/sonic-mgmt/pull/12229/files
Hi @StormLiangMS, #12229 , partly fixes the issue ( which I have described in the PR description ), we still however need the change in this PR because the lossless_prio_dscp_map should retrieve lossless priorities from the downlink ports only. Currently it retrieves randomly from any port and in case it's an uplink port the test doesn't skip for PG2 and 6 and it fails because the ports used by the test are server facing ports ( downlink ): https://github.com/sonic-net/sonic-mgmt/blob/master/tests/qos/test_pfc_pause.py#L52-L58
hi @bingwang-ms could you help to review this PR?
@vivekverma-arista I'm not understand why downlink ports failed, since it will skip PG2 and 6 if it is not an uplink ports?
Hi @StormLiangMS,
-
The test is written to run on downlink ports only because it is choosing the downlink ports as src/dst interfaces: https://github.com/sonic-net/sonic-mgmt/blob/master/tests/qos/test_pfc_pause.py#L52-L58. Therefore it should run for lossy PGs programmed for downlink ports only.
-
The fixture
lossless_prio_dscp_mapcurrently tries to retrieve lossy priorities randomly from any interface rather than downlink interface under the assumption that lossy PGs as well as DSCP_TO_TC_MAP is same for all interfaces which is not true in case of dualtor. If by chance a downlink port is returned here: https://github.com/sonic-net/sonic-mgmt/blob/master/tests/qos/qos_fixtures.py#L16, then later this condition https://github.com/sonic-net/sonic-mgmt/blob/master/tests/qos/test_pfc_pause.py#L316-L317 ( introduced by #12229 ) evaluates to false and the test continues running withWRONGlossless priorities for downlink ports.
Hi @StormLiangMS,
- The test is written to run on downlink ports only because it is choosing the downlink ports as src/dst interfaces: https://github.com/sonic-net/sonic-mgmt/blob/master/tests/qos/test_pfc_pause.py#L52-L58. Therefore it should run for lossy PGs programmed for downlink ports only.
- The fixture
lossless_prio_dscp_mapcurrently tries to retrieve lossy priorities randomly from any interface rather than downlink interface under the assumption that lossy PGs as well as DSCP_TO_TC_MAP is same for all interfaces which is not true in case of dualtor. If by chance a downlink port is returned here: https://github.com/sonic-net/sonic-mgmt/blob/master/tests/qos/qos_fixtures.py#L16, then later this condition https://github.com/sonic-net/sonic-mgmt/blob/master/tests/qos/test_pfc_pause.py#L316-L317 ( introduced by Improve test_pfc_pause to support dualtor #12229 ) evaluates to false and the test continues running withWRONGlossless priorities for downlink ports and later fails with the signature.Failed: Dscp: [3], Background Dscp: 4, errors occured: Ethernet8:[0, 20] Ethernet12:[0, 20]
Hi @vivekverma-arista , can you please help me understand why test failed if the selected priority is 3, and background dscp is 4, which is another lossless priority? As far as I understand, the test should also pass in this combinaiton.
Hi @StormLiangMS,
- The test is written to run on downlink ports only because it is choosing the downlink ports as src/dst interfaces: https://github.com/sonic-net/sonic-mgmt/blob/master/tests/qos/test_pfc_pause.py#L52-L58. Therefore it should run for lossy PGs programmed for downlink ports only.
- The fixture
lossless_prio_dscp_mapcurrently tries to retrieve lossy priorities randomly from any interface rather than downlink interface under the assumption that lossy PGs as well as DSCP_TO_TC_MAP is same for all interfaces which is not true in case of dualtor. If by chance a downlink port is returned here: https://github.com/sonic-net/sonic-mgmt/blob/master/tests/qos/qos_fixtures.py#L16, then later this condition https://github.com/sonic-net/sonic-mgmt/blob/master/tests/qos/test_pfc_pause.py#L316-L317 ( introduced by Improve test_pfc_pause to support dualtor #12229 ) evaluates to false and the test continues running withWRONGlossless priorities for downlink ports and later fails with the signature.Failed: Dscp: [3], Background Dscp: 4, errors occured: Ethernet8:[0, 20] Ethernet12:[0, 20]Hi @vivekverma-arista , can you please help me understand why test failed if the selected priority is 3, and background dscp is 4, which is another lossless priority? As far as I understand, the test should also pass in this combinaiton.
Hi @bingwang-ms, apologies for pasting the wrong error message containing PG3 and 4, that failure is specific to dualtor-aa and I got confused, and for which I will raise a separate pull request.
This PR was about test continuing to run for PG2 and 6 rather than getting skipped.
Why will the test continue to run for PG2 and 6 in certain cases ( very rare )?
There is a chance that the value of intf here will be an uplink port ( because the order in which the keys is returned is not fixed ). But the test is not written to run with uplink ports.
(Pdb) port_qos_map.keys()
dict_keys(['Ethernet4', 'Ethernet8', 'Ethernet12', 'Ethernet16', 'Ethernet20', 'Ethernet24', 'Ethernet28', 'Ethernet32', 'Ethernet36', 'Ethernet40', 'Ethernet44', 'Ethernet48', 'Ethernet52', 'Ethernet56', 'Ethernet60', 'Ethernet64', 'Ethernet68', 'Ethernet72', 'Ethernet76', 'Ethernet80', 'Ethernet84', 'Ethernet88', 'Ethernet92', 'Ethernet96', 'Ethernet112', 'Ethernet116', 'Ethernet120', 'Ethernet124', 'global'])
Next there is a chance here that value of profile becomes AZURE_TUNNEL or AZURE_UPLINK ( because again the order in which the keys are returned is not fixed and these maps are not applicable to downlink ports )
(Pdb) prio_to_tc_map.keys()
dict_keys(['AZURE', 'AZURE_TUNNEL', 'AZURE_UPLINK'])
So with the wrong combination of interface and DSCP_TO_TC_MAP the test will continue to run for PG 2 and 6 for dualtor. as this condition doesn't evaluate to true in that case: https://github.com/sonic-net/sonic-mgmt/blob/master/tests/qos/test_pfc_pause.py#L316-L317 because wrong map was used to find DSCP for the PG.
The fix in this PR retrieves a downlink port and then later retrieves the list exact DSCP_TO_TC_MAP applied on that interface because the src and destination interfaces in this test are downlink ports and getting selected here: https://github.com/sonic-net/sonic-mgmt/blob/master/tests/qos/test_pfc_pause.py#L70-L74
However, the test still passes if it continues to to run for PG2 and 6 for dualtor/dualtor-aa which is not correct as the lossless PGs are only 2 and 6 for dualtor downlink ports.
@StormLiangMS can we backport this to 202311 as well?