sonic-mgmt icon indicating copy to clipboard operation
sonic-mgmt copied to clipboard

Leverage the fanout's ASIC to send PFC storm

Open nhe-NV opened this issue 1 year ago • 6 comments
trafficstars

Currently, the PFC storm is sent using packet socket on the host CPU, which causes some drawbacks

  • There is strict timing requirement, which can not be guaranteed
  • The more ports under PFC storm, the higher probability that the timing requirement is not satisfied

We will leverage the fanout's ASIC to send PFC storm

  • Use SDK API to enable lossless traffic, set xoff to 0, and setup mappings for the PGs and the corresponding IEEE priorities

After the change, the pfc related test case which is using onxy as fanout will send the pfc frame with ASIC instead of using the CPU. New docker storm docker need to be build with the Docker file provided in this PR.

Description of PR

Summary: Fixes # (issue)

Type of change

  • [ ] Bug fix
  • [ ] Testbed and Framework(new/improvement)
  • [x] Test case(new/improvement)

Back port request

  • [ ] 201911
  • [ ] 202012
  • [ ] 202205
  • [x] 202305
  • [x] 202311
  • [x] 202405

Approach

What is the motivation for this PR?

To make the pfc related test case much more stable.

How did you do it?

Change sending pfc frame with CPU to ASIC on the onxy fanout.

How did you verify/test it?

Run the pfc related test case on the setup which has onxy fanout, test case could pass stablly.

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

nhe-NV avatar Jun 12 '24 02:06 nhe-NV

The pre-commit check detected issues in the files touched by this pull request. The pre-commit check is a mandatory check, please fix detected issues.

Detailed pre-commit check results: trim trailing whitespace.................................................Passed
fix end of files.........................................................Passed
check yaml...........................................(no files to check)Skipped
check for added large files..............................................Passed
check python ast.........................................................Passed
flake8...................................................................Failed
- hook id: flake8
- exit code: 1

ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:13:1: F403 'from python_sdk_api.sx_api import *' used; unable to detect undefined names
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:18:1: E302 expected 2 blank lines, found 1
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:21:1: E302 expected 2 blank lines, found 1
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:32:1: E302 expected 2 blank lines, found 1
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:33:10: F405 'sx_api_port_pfc_enable_set' may be undefined, or defined from star imports: python_sdk_api.sx_api
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:34:14: F405 'SX_STATUS_SUCCESS' may be undefined, or defined from star imports: python_sdk_api.sx_api
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:38:17: F405 'new_sx_cos_priority_t_arr' may be undefined, or defined from star imports: python_sdk_api.sx_api
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:39:15: F405 'new_sx_cos_ieee_prio_t_arr' may be undefined, or defined from star imports: python_sdk_api.sx_api
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:40:18: F405 'new_sx_cos_port_prio_buff_t_p' may be undefined, or defined from star imports: python_sdk_api.sx_api
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:42:1: E302 expected 2 blank lines, found 1
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:43:5: F405 'sx_cos_priority_t_arr_setitem' may be undefined, or defined from star imports: python_sdk_api.sx_api
...
[truncated extra lines, please run pre-commit locally to view full check results]

To run the pre-commit checks locally, you can follow below steps:

  1. Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt docker container.
  2. Ensure that the pre-commit package is installed:
sudo pip install pre-commit
  1. Go to repository root folder
  2. Install the pre-commit hooks:
pre-commit install
  1. Use pre-commit to check staged file:
pre-commit
  1. Alternatively, you can check committed files using:
pre-commit run --from-ref <commit_id> --to-ref <commit_id>

mssonicbld avatar Jun 12 '24 02:06 mssonicbld

The pre-commit check is mandatory. Can you help fix it?

wsycqyz avatar Jun 13 '24 03:06 wsycqyz

@bingwang-ms This seems to be a big PR. FYI.

wsycqyz avatar Jun 13 '24 03:06 wsycqyz

/azpw run

nhe-NV avatar Jun 15 '24 09:06 nhe-NV

/AzurePipelines run

mssonicbld avatar Jun 15 '24 09:06 mssonicbld

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Jun 15 '24 09:06 azure-pipelines[bot]

@kperumalbfn Can you please help review this PR?

bingwang-ms avatar Jul 18 '24 21:07 bingwang-ms

@nhe-NV Does this change require the leaf-fanout switch running Onyx? If that's the case, the PR may not help us because we are not using Onyx as leaf-fanout OS. BTW, I didn't see code for building and importing the new container. Is that done manually?

bingwang-ms avatar Jul 22 '24 21:07 bingwang-ms

The new PFCWD logic requires queue not empty. https://github.com/sonic-net/sonic-swss/pull/3036 How is that condition satisfied?

bingwang-ms avatar Jul 22 '24 22:07 bingwang-ms

The new PFCWD logic requires queue not empty. sonic-net/sonic-swss#3036 How is that condition satisfied?

Hi @bingwang-ms the logic is implemented in https://github.com/sonic-net/sonic-mgmt/pull/12733. If run the test case with the PR #12733 only, and not include this PR, you will have the "queue not empty" requirement met, but due to the pfc frame is sent by the CPU, the traffic is not stable, and espicially for test_pfcwd_timer_accuracy, the test could pass with very low possiblity.

nhe-NV avatar Jul 23 '24 01:07 nhe-NV

The new PFCWD logic requires queue not empty. sonic-net/sonic-swss#3036 How is that condition satisfied?

Hi @bingwang-ms the logic is implemented in #12733. If run the test case with the PR #12733 only, and not include this PR, you will have the "queue not empty" requirement met, but due to the pfc frame is sent by the CPU, the traffic is not stable, and espicially for test_pfcwd_timer_accuracy, the test could pass with very low possiblity.

I see. Thanks for the clarification. Then it makes sense to me. I missed #12733 as there is no cherry-pick request. I just added the cherry-pick label for it.

bingwang-ms avatar Jul 23 '24 17:07 bingwang-ms

@nhe-NV Does this change require the leaf-fanout switch running Onyx? If that's the case, the PR may not help us because we are not using Onyx as leaf-fanout OS. BTW, I didn't see code for building and importing the new container. Is that done manually?

Can you please check this question as well?

bingwang-ms avatar Jul 23 '24 17:07 bingwang-ms

Related PR for Non-Onyx leaf-fanout https://github.com/sonic-net/sonic-mgmt/pull/13768

bingwang-ms avatar Jul 23 '24 17:07 bingwang-ms

@nhe-NV Does this change require the leaf-fanout switch running Onyx? If that's the case, the PR may not help us because we are not using Onyx as leaf-fanout OS. BTW, I didn't see code for building and importing the new container. Is that done manually?

Can you please check this question as well?

I see you found the answer for nononxy one: "Related PR for Non-Onyx leaf-fanout https://github.com/sonic-net/sonic-mgmt/pull/13768"

@nhe-NV Does this change require the leaf-fanout switch running Onyx? If that's the case, the PR may not help us because we are not using Onyx as leaf-fanout OS. BTW, I didn't see code for building and importing the new container. Is that done manually?

Can you please check this question as well? I see you have found the answer that "Related PR for Non-Onyx leaf-fanout https://github.com/sonic-net/sonic-mgmt/pull/13768"

What do you mean this" I didn't see code for building and importing the new container. Is that done manually?" In the Code, we do have the code to build the container.

nhe-NV avatar Jul 23 '24 23:07 nhe-NV

@nhe-NV Does this change require the leaf-fanout switch running Onyx? If that's the case, the PR may not help us because we are not using Onyx as leaf-fanout OS. BTW, I didn't see code for building and importing the new container. Is that done manually?

Can you please check this question as well?

I see you found the answer for nononxy one: "Related PR for Non-Onyx leaf-fanout #13768"

@nhe-NV Does this change require the leaf-fanout switch running Onyx? If that's the case, the PR may not help us because we are not using Onyx as leaf-fanout OS. BTW, I didn't see code for building and importing the new container. Is that done manually?

Can you please check this question as well? I see you have found the answer that "Related PR for Non-Onyx leaf-fanout #13768"

What do you mean this" I didn't see code for building and importing the new container. Is that done manually?" In the Code, we do have the code to build the container.

@nhe-NV Does this change require the leaf-fanout switch running Onyx? If that's the case, the PR may not help us because we are not using Onyx as leaf-fanout OS. BTW, I didn't see code for building and importing the new container. Is that done manually?

Can you please check this question as well?

I see you found the answer for nononxy one: "Related PR for Non-Onyx leaf-fanout #13768"

@nhe-NV Does this change require the leaf-fanout switch running Onyx? If that's the case, the PR may not help us because we are not using Onyx as leaf-fanout OS. BTW, I didn't see code for building and importing the new container. Is that done manually?

Can you please check this question as well? I see you have found the answer that "Related PR for Non-Onyx leaf-fanout #13768"

What do you mean this" I didn't see code for building and importing the new container. Is that done manually?" In the Code, we do have the code to build the container. The container has the same name as prev one. so nonthing changed when using it.

nhe-NV avatar Jul 23 '24 23:07 nhe-NV

@nhe-NV Does this change require the leaf-fanout switch running Onyx? If that's the case, the PR may not help us because we are not using Onyx as leaf-fanout OS. BTW, I didn't see code for building and importing the new container. Is that done manually?

Can you please check this question as well?

I see you found the answer for nononxy one: "Related PR for Non-Onyx leaf-fanout #13768"

@nhe-NV Does this change require the leaf-fanout switch running Onyx? If that's the case, the PR may not help us because we are not using Onyx as leaf-fanout OS. BTW, I didn't see code for building and importing the new container. Is that done manually?

Can you please check this question as well? I see you have found the answer that "Related PR for Non-Onyx leaf-fanout #13768"

What do you mean this" I didn't see code for building and importing the new container. Is that done manually?" In the Code, we do have the code to build the container.

I see. So the code for building the docker is not changed right? Can you please point me the code for building the docker?

bingwang-ms avatar Jul 24 '24 01:07 bingwang-ms

@nhe-NV Does this change require the leaf-fanout switch running Onyx? If that's the case, the PR may not help us because we are not using Onyx as leaf-fanout OS. BTW, I didn't see code for building and importing the new container. Is that done manually?

Can you please check this question as well?

I see you found the answer for nononxy one: "Related PR for Non-Onyx leaf-fanout #13768"

@nhe-NV Does this change require the leaf-fanout switch running Onyx? If that's the case, the PR may not help us because we are not using Onyx as leaf-fanout OS. BTW, I didn't see code for building and importing the new container. Is that done manually?

Can you please check this question as well? I see you have found the answer that "Related PR for Non-Onyx leaf-fanout #13768"

What do you mean this" I didn't see code for building and importing the new container. Is that done manually?" In the Code, we do have the code to build the container.

I see. So the code for building the docker is not changed right? Can you please point me the code for building the docker?

Hi @bingwang-ms In this PR, you can find ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/Makefile, this is a new file for build the new docker. but the docker name that build is same as prev

nhe-NV avatar Jul 24 '24 01:07 nhe-NV

@nhe-NV Does this change require the leaf-fanout switch running Onyx? If that's the case, the PR may not help us because we are not using Onyx as leaf-fanout OS. BTW, I didn't see code for building and importing the new container. Is that done manually?

Can you please check this question as well?

I see you found the answer for nononxy one: "Related PR for Non-Onyx leaf-fanout #13768"

@nhe-NV Does this change require the leaf-fanout switch running Onyx? If that's the case, the PR may not help us because we are not using Onyx as leaf-fanout OS. BTW, I didn't see code for building and importing the new container. Is that done manually?

Can you please check this question as well? I see you have found the answer that "Related PR for Non-Onyx leaf-fanout #13768"

What do you mean this" I didn't see code for building and importing the new container. Is that done manually?" In the Code, we do have the code to build the container.

I see. So the code for building the docker is not changed right? Can you please point me the code for building the docker?

Hi @bingwang-ms In this PR, you can find ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/Makefile, this is a new file for build the new docker. but the docker name that build is same as prev

Yes, I saw the makefile. But can you point me the code for building the new docker image? I didn't see that code in this PR.

bingwang-ms avatar Jul 24 '24 02:07 bingwang-ms

@nhe-NV Does this change require the leaf-fanout switch running Onyx? If that's the case, the PR may not help us because we are not using Onyx as leaf-fanout OS. BTW, I didn't see code for building and importing the new container. Is that done manually?

Can you please check this question as well?

I see you found the answer for nononxy one: "Related PR for Non-Onyx leaf-fanout #13768"

@nhe-NV Does this change require the leaf-fanout switch running Onyx? If that's the case, the PR may not help us because we are not using Onyx as leaf-fanout OS. BTW, I didn't see code for building and importing the new container. Is that done manually?

Can you please check this question as well? I see you have found the answer that "Related PR for Non-Onyx leaf-fanout #13768"

What do you mean this" I didn't see code for building and importing the new container. Is that done manually?" In the Code, we do have the code to build the container.

I see. So the code for building the docker is not changed right? Can you please point me the code for building the docker?

Hi @bingwang-ms In this PR, you can find ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/Makefile, this is a new file for build the new docker. but the docker name that build is same as prev

Yes, I saw the makefile. But can you point me the code for building the new docker image? I didn't see that code in this PR.

Is this that you are looking for? image

nhe-NV avatar Jul 24 '24 02:07 nhe-NV

Hi @nhe-NV , I understand we have Makefile and Dockerfile. I'm looking for the code to call the make command or the docker build command as I didn't see that code in this change. Can you point it to me?

bingwang-ms avatar Jul 26 '24 20:07 bingwang-ms

Hi @nhe-NV , I understand we have Makefile and Dockerfile. I'm looking for the code to call the make command or the docker build command as I didn't see that code in this change. Can you point it to me?

Hi @bingwang-ms , same as prev pfc_wd docker, it dose not have the build command in the script, user will build their own docker and save it locally on demand. In the relevent test case, it will use it directly

nhe-NV avatar Jul 29 '24 01:07 nhe-NV

I see. Thanks for the clarification. Please address the conflict.

bingwang-ms avatar Jul 29 '24 16:07 bingwang-ms

The pre-commit check detected issues in the files touched by this pull request. The pre-commit check is a mandatory check, please fix detected issues.

Detailed pre-commit check results: trim trailing whitespace.................................................Passed
fix end of files.........................................................Passed
check yaml...........................................(no files to check)Skipped
check for added large files..............................................Passed
check python ast.........................................................Passed
flake8...................................................................Failed
- hook id: flake8
- exit code: 1

ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:119:18: F405 'new_uint32_t_p' may be undefined, or defined from star imports: python_sdk_api.sx_api
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:120:5: F405 'uint32_t_p_assign' may be undefined, or defined from star imports: python_sdk_api.sx_api
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:121:28: F405 'new_sx_port_attributes_t_arr' may be undefined, or defined from star imports: python_sdk_api.sx_api
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:122:10: F405 'sx_api_port_device_get' may be undefined, or defined from star imports: python_sdk_api.sx_api
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:123:14: F405 'SX_STATUS_SUCCESS' may be undefined, or defined from star imports: python_sdk_api.sx_api
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:126:16: F405 'uint32_t_p_value' may be undefined, or defined from star imports: python_sdk_api.sx_api
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:129:28: F405 'new_sx_port_attributes_t_arr' may be undefined, or defined from star imports: python_sdk_api.sx_api
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:130:10: F405 'sx_api_port_device_get' may be undefined, or defined from star imports: python_sdk_api.sx_api
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:131:15: F405 'SX_STATUS_SUCCESS' may be undefined, or defined from star imports: python_sdk_api.sx_api
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:141:27: F405 'sx_port_attributes_t_arr_getitem' may be undefined, or defined from star imports: python_sdk_api.sx_api
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:156:27: F405 'sx_port_attributes_t_arr_getitem' may be undefined, or defined from star imports: python_sdk_api.sx_api
...
[truncated extra lines, please run pre-commit locally to view full check results]

To run the pre-commit checks locally, you can follow below steps:

  1. Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt docker container.
  2. Ensure that the pre-commit package is installed:
sudo pip install pre-commit
  1. Go to repository root folder
  2. Install the pre-commit hooks:
pre-commit install
  1. Use pre-commit to check staged file:
pre-commit
  1. Alternatively, you can check committed files using:
pre-commit run --from-ref <commit_id> --to-ref <commit_id>

mssonicbld avatar Jul 30 '24 00:07 mssonicbld

Cherry-pick PR to 202405: https://github.com/sonic-net/sonic-mgmt/pull/13885

mssonicbld avatar Jul 30 '24 16:07 mssonicbld