sonic-mgmt icon indicating copy to clipboard operation
sonic-mgmt copied to clipboard

[Snappi]: PFC - Mixed Speed testcases

Open amitpawar12 opened this issue 1 year ago • 1 comments

Description of PR

As part of the new testcases to be added for the PFC-ECN, this PR addresses the mixed-speed ingress and egress testcases.

Summary: Fixes # (issue) #13655 #13215

Type of change

  • [ ] Bug fix
  • [ ] Testbed and Framework(new/improvement)
  • [X] Test case(new/improvement)

Back port request

  • [ ] 202012
  • [ ] 202205
  • [ ] 202305
  • [ ] 202311
  • [X] 202405

Approach

What is the motivation for this PR?

This script addresses the mixed speed testcases. The topology has single ingress and egress of 400Gbps and 100Gbps respectively. The congestion is caused due to three factors:

  • Due to oversubscription of egress.
  • Pause frames received on egress link of 100Gbps.
  • Both - over-subscription of egress and pause frames received on egress.

Idea is to test behavior of the DUT in these conditions.

How did you do it?

The port_map defines to choose single ingress of 400Gbps and egress of 100Gbps.

Following test functions are used:

  1. test_mixed_speed_diff_dist_dist_over: Lossless and lossy traffic are sent at 88 and 12% of the line-rate (400Gbps) respectively, causing normal congestion on DUT due to oversubscription of the egress. Lossless priority 3 and 4 are used, whereas lossy priorities are 0,1 and 2. Expectation is that lossless priorities will cause DUT to send PAUSE frames to IXIA transmitter, will be rate-limited and hence no drops. Lossy priority traffic will see no drops at all. Egress throughput is expected to be around 100Gbps. Lossy ingress and egress throughput does not change.

  2. test_mixed_speed_uni_dist_dist_over: Lossless and lossy traffic are sent at 20% of the line-rate (400Gbps) respectively, causing normal congestion on DUT due to oversubscription of the egress. Lossless priority 3 and 4 are used, whereas lossy priorities are 0,1 and 2. Expectation is that lossless priorities will cause DUT to send PAUSE frames to IXIA transmitter, will be rate-limited and hence no drops. Lossy priority traffic will however see partial drop. Egress throughput is expected to be around 100Gbps with lossless and lossy traffic of equal (or close to equal) ratio.

  3. test_mixed_speed_pfcwd_enable: Lossless and lossy traffic are sent at 20% of the line-rate (400Gbps) respectively, causing normal congestion on DUT due to oversubscription of the egress. Lossless priority 3 and 4 are used, whereas lossy priorities are 0,1 and 2. Additionally, the IXIA receiver is sending PAUSE frames to DUT for lossless priority traffic. This causes additional congestion on the DUT. Expectation is that DUT sends PFC to the IXIA transmitter for lossless priorities in response to natural congestion on DUT due to oversubscription of egress. Lossless priority is rate-limited by IXIA in response to PFCs from DUT. Lossy priority is partially dropped on DUT. But since the DUT is receiving PFCs on egress, the rate-limited lossless traffic is eventually dropped on egress. The IXIA receiver receives ONLY 60Gbps of lossy traffic.

  4. test_mixed_speed_pfcwd_disable: Lossless and lossy traffic are sent at 20% of the line-rate (400Gbps) respectively, causing normal congestion on DUT due to oversubscription of the egress. Lossless priority 3 and 4 are used, whereas lossy priorities are 0,1 and 2. Additionally, the IXIA receiver is sending PAUSE frames to DUT for lossless priority traffic. This causes additional congestion on the DUT. Since PFCWD is disabled in this scenario, DUT forwards both lossless and lossy traffic to the IXIA receiver. DUT is sending PFCs in response to natural congestion as well as PFCs received on the egress. The egress line-rate is 100Gbps with lossy traffic being partially dropped. Lossy and lossless traffic are in equal (or close to equal) ratio.

  5. test_mixed_speed_no_congestion: Purpose of the testcase is to see if the DUT does not congestion in case the ingress 400Gbps is receiving 100Gbps of traffic, which it seamlessly moves to the egress without any drops or congestion.

For all the above testcases, an additional check for the fabric counters is added. The tests will clear the fabric counters on line-cards and supervisor card (if part of the test). At the end of the test, counters are being checked again for CRC and uncorrectable FEC errors and asserts if the counts are non-zero. The checks are added as part of a different PR process and will need to be merged first. The underlying infra also needs to be added first before the testcases are added.

How did you verify/test it?

Tested on local platform.

16:05:25 traffic_generation.run_sys_traffic       L1190 INFO   | Writing statistics to file : /tmp/Single_400Gbps_Ingress_Single_100Gbps_Egress_diff_dist__multiple-dut-mixed-speed_1024B-2024-10-09-16-05.csv
PASSED                                                                                                                                                                                                                                        [ 20%]
16:13:48 traffic_generation.run_sys_traffic       L1190 INFO   | Writing statistics to file : /tmp/Single_400Gbps_Ingress_Single_100Gbps_Egress_uni_dist__multiple-dut-mixed-speed_1024B-2024-10-09-16-13.csv
PASSED                                                                                                                                                                                                                                        [ 40%]
16:22:13 traffic_generation.run_sys_traffic       L1190 INFO   | Writing statistics to file : /tmp/Single_400Gbps_Ingress_Single_100Gbps_Egress_pause_pfcwd_enable__multiple-dut-mixed-speed_1024B-2024-10-09-16-22.csv
PASSED                                                                                                                                                                                                                                        [ 60%]
16:30:33 traffic_generation.run_sys_traffic       L1190 INFO   | Writing statistics to file : /tmp/Single_400Gbps_Ingress_Single_100Gbps_Egress_pause_pfcwd_disable__multiple-dut-mixed-speed_1024B-2024-10-09-16-30.csv
PASSED                                                                                                                                                                                                                                        [ 80%]
16:38:56 traffic_generation.run_sys_traffic       L1190 INFO   | Writing statistics to file : /tmp/Single_400Gbps_Ingress_Single_100Gbps_Egress_no_cong__multiple-dut-mixed-speed_1024B-2024-10-09-16-38.csv
PASSED                                                                                                                                                                                                                                        [100%]

Any platform specific information?

The test is specifically meant for Broadcom-DNX multi-ASIC platforms ONLY.

Supported testbed topology if it's a new test case?

Documentation

amitpawar12 avatar Aug 14 '24 13:08 amitpawar12

@amitpawar12 can you update the test results summary in these PR? Thanks.

sdszhang avatar Oct 08 '24 07:10 sdszhang

Attaching results of the test-case execution.

Thanks, -A results-pfc-mixed-speed.txt.txt

amitpawar12 avatar Jan 06 '25 20:01 amitpawar12

/azp run

mssonicbld avatar Jan 10 '25 16:01 mssonicbld

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Jan 10 '25 16:01 azure-pipelines[bot]

/azp run

mssonicbld avatar Jan 10 '25 22:01 mssonicbld

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Jan 10 '25 22:01 azure-pipelines[bot]

@amitpawar12 The infra change has been merged. Can you resolve the conflict.

sdszhang avatar Jan 14 '25 03:01 sdszhang

/azp run

mssonicbld avatar Jan 14 '25 22:01 mssonicbld

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Jan 14 '25 22:01 azure-pipelines[bot]

Cherry-pick PR to 202411: https://github.com/sonic-net/sonic-mgmt/pull/16524

mssonicbld avatar Jan 15 '25 05:01 mssonicbld

Cherry-pick PR to 202411: https://github.com/sonic-net/sonic-mgmt/pull/16538

mssonicbld avatar Jan 16 '25 01:01 mssonicbld

@amitpawar12 : Can you pls provide the variables.py and sonic_lab_links.csv that you used for this run ? Thanks.

I keep running into this problem:

        for testbed_subtype, rdma_ports in MIXED_SPEED_PORT_INFO[MULTIDUT_TESTBED].items():
            tx_port_count = port_map[0]
            rx_port_count = port_map[2]
            snappi_port_list = get_snappi_ports
            pytest_require(MULTIDUT_TESTBED == tbinfo['conf-name'],
                           "The testbed name from testbed file doesn't match with MULTIDUT_TESTBED in variables.py ")
>           pytest_require(len(snappi_port_list) >= tx_port_count + rx_port_count,
                           "Need Minimum of 2 ports defined in ansible/files/*links.csv file")
E           TypeError: object of type 'NoneType' has no len()

The snappi_port_list is None since I have different port speeds in sonic_lab_links.csv.

rraghav-cisco avatar Apr 21 '25 20:04 rraghav-cisco

Please find the info below:

Snapshot of variables.py:

MIXED_SPEED_PORT_INFO = {MULTIDUT_TESTBED: (
    ({
        multiple-dut-any-asic': {
            'rx_ports': [
                {'port_name': 'Ethernet0', 'hostname': "board72"},
                {'port_name': 'Ethernet8', 'hostname': "board72"},
                {'port_name': 'Ethernet16', 'hostname': "board72"}
            ],
            'tx_ports': [
                {'port_name': 'Ethernet0', 'hostname': "board73"},
                {'port_name': 'Ethernet8', 'hostname': "board73"},
                {'port_name': 'Ethernet16', 'hostname': "board73"}
            ]
        }
    })
)}

Snapshot of links.csv file:

StartDevice,StartPort,EndDevice,EndPort,BandWidth,VlanID,VlanMode
board72,Ethernet0,ixia-sonic3,Port10.1,100000,,Access
board72,Ethernet8,ixia-sonic3,Port10.2,100000,,Access
board72,Ethernet16,ixia-sonic3,Port10.3,100000,,Access
board72,Ethernet24,ixia-sonic3,Port10.4,100000,,Access
board73,Ethernet0,ixia-sonic3,Port1,400000,,Access
board73,Ethernet8,ixia-sonic3,Port2,400000,,Access
board73,Ethernet16,ixia-sonic3,Port5,400000,,Access
board73,Ethernet144,ixia-sonic3,Port3,400000,,Access

In above case, board#72 has 100Gbps interfaces and board#73 has 400Gbps interfaces.

There are two things here:

  • All existing Snappi testcases are designed for ONLY one single line-speed.
  • Mixed-speed requires multiple-speeds.
  • Hence, I have defined a different dictionary for mixed speed testcases, so that I don’t jump on existing testcases definition in variables.py.

amitpawar12 avatar Apr 23 '25 15:04 amitpawar12