sonic-swss icon indicating copy to clipboard operation
sonic-swss copied to clipboard

[PFCWD + Asym PFC]: Allow PFCWD to detect PFC storm on all priorities when Asym PFC is enabled

Open volodymyrsamotiy opened this issue 4 years ago • 41 comments

Signed-off-by: Volodymyr Samotiy [email protected]

What I did Add possibility for PFCWD to detect PFC storm on all priorities when asymmetric PFC is enabled.

According to requirements PFCWD should detect PFC storm on the lossy priorities for the port on which asymmetric PFC is enabled. (when asymmetric PFC is enabled on some port that means such port is enabled for receiving PFC frames on all priorities)

Why I did it To fix issue https://github.com/Azure/sonic-swss/issues/1128.

How I verified it

  1. Enable asymmetric PFC on test port.
pfc config asymmetric on <port>
  1. Enable PFCWD on the test port.
pfcwd start --action drop --restoration-time 3000 <port> 1000
pfcwd interval 500
  1. Run pause frames to the test port for all priorities.
  2. Verify that PFCWD detected storm for all priorities.
pfcwd show stats

Details if related N/A

volodymyrsamotiy avatar Jul 23 '20 08:07 volodymyrsamotiy

retest vs please

liat-grozovik avatar Jul 27 '20 12:07 liat-grozovik

retest this please

liat-grozovik avatar Jul 28 '20 17:07 liat-grozovik

retest this please

volodymyrsamotiy avatar Jul 29 '20 07:07 volodymyrsamotiy

retest this please

liat-grozovik avatar Aug 10 '20 07:08 liat-grozovik

retest this please

volodymyrsamotiy avatar Aug 12 '20 08:08 volodymyrsamotiy

retest this please

volodymyrsamotiy avatar Aug 12 '20 09:08 volodymyrsamotiy

retest this please

liat-grozovik avatar Sep 30 '20 07:09 liat-grozovik

@volodymyrsamotiy please handle conflicts

liat-grozovik avatar Dec 29 '20 18:12 liat-grozovik

@volodymyrsamotiy please handle conflicts

handled

volodymyrsamotiy avatar Feb 23 '21 11:02 volodymyrsamotiy

@neethajohn kindly reminder. this PR should go to 202012 as well.

liat-grozovik avatar Feb 28 '21 16:02 liat-grozovik

@neethajohn Kindly Reminder, we are waiting for the merge of this PR. Thanks.

dprital avatar Mar 05 '21 00:03 dprital

@neethajohn - Who can merge this PR ? Thanks.

dprital avatar Mar 10 '21 08:03 dprital

This pull request fixes 1 alert when merging 53cf04ee65bfee395c876fca3459a6f066a961ce into c37cc1c56336d242fdcc3e728f8e86be992dd382 - view on LGTM.com

fixed alerts:

  • 1 for Unused import

lgtm-com[bot] avatar Jun 25 '21 16:06 lgtm-com[bot]

Please add unit tests

@neethajohn done. Please review

liat-grozovik avatar Jun 28 '21 13:06 liat-grozovik

This pull request fixes 1 alert when merging a946c21d481797b23683acfc84e969cef6e57841 into 15a014bf666916a253053f9419522e6b8e8e20f3 - view on LGTM.com

fixed alerts:

  • 1 for Unused import

lgtm-com[bot] avatar Aug 18 '21 07:08 lgtm-com[bot]

@neethajohn kindly reminder to approve this PR. once approved we will issue a new one against 202012 as it is not cleanly cherry picked.

liat-grozovik avatar Aug 22 '21 12:08 liat-grozovik

@volodymyrsamotiy , I think these unit tests are not sufficient to ensure that pfcwd feature is working correctly for asym mode and in combined mode. I am concerned that there might be scenarios we miss and end up causing regression. Can we add unit tests for the following flow in both combined and asym mode?

  1. start pfcwd
  2. verify the correct set of counters are added in flex counter for polling
  3. in asic db, verify that the pfc mask is getting disabled for the right queue/queues in storm
  4. stop pfcwd, in asic db, verify that pfc mask is enabled for the right queue

Also are the current pfcwd and asym pfc testcases in sonic-mgmt passing with this change?

neethajohn avatar Sep 03 '21 17:09 neethajohn

@neethajohn, I agree that unit test checks only simple logic but that is because we are pretty limited on virtual switch. Currently we don't have any VS test for PFCWD and it was never checked on VS so maybe even some additional VS infra will required. A while ago I already tried it on virtual switch and PFCWD logic is not fully functional on it, so test checks everything we currently can check. In any case I will think about it more and check what else we can do. Regarding "sonic-mgmt" test, I will first fix latest comments for the "tx mask", then verify and let you know.

@volodymyrsamotiy, No vs test for pfcwd is making it difficult to ensure that there are no regressions for any changes/features that are getting added in this area. As far as I can think of, to verify the flow that I mentioned earlier, we need to set the 'pfc_enable' in PORT_QOS_MAP, enable flex counter for pfcwd, create pfcwd table for the port under test with the detection/restoration intervals. For mocking the storm, there is 'DEBUG_STORM' attr which can be set for the queue that needs to be in storm in COUNTERS_DB.

volodymyrsamotiy avatar Sep 09 '21 07:09 volodymyrsamotiy

This pull request fixes 1 alert when merging 48375340849e8fb97bdb0389f0965cc47a2a8d8c into 8cf355db09fc2ffdadcea936c349301193a760da - view on LGTM.com

fixed alerts:

  • 1 for Unused import

lgtm-com[bot] avatar Sep 14 '21 10:09 lgtm-com[bot]

This pull request fixes 1 alert when merging 9586b6f8395ae1391d05362db9830b67b2f6525e into 8cf355db09fc2ffdadcea936c349301193a760da - view on LGTM.com

fixed alerts:

  • 1 for Unused import

lgtm-com[bot] avatar Sep 14 '21 14:09 lgtm-com[bot]

@neethajohn do you have additional comments or you can sign off?

liat-grozovik avatar Sep 29 '21 08:09 liat-grozovik

@neethajohn do you have additional comments or you can sign off?

@liat-grozovik , I don't see my comments regarding vs tests addressed. In the last review I did, there were issues in the logic that could have been caught by the vs test flow I had suggested. Also I didn't hear back on whether the current pfcwd and asym pfc testcases in sonic-mgmt are passing with this change.

neethajohn avatar Oct 01 '21 17:10 neethajohn

@neethajohn, please find summary below, just to have it in one place since now it is hard to find all the details:

  1. Current tests in sonic-mgmt are passing with these changes.
  2. As I mentioned previously PFCWD is not fully functional on the VS, so what you suggested currently is not working. I am aware of a possibility to start DEBUG storm via Redis DB but it doesn't help. And I tried the same approach as was suggested long time ago and double checked recently but storm cannot be simulated on VS like that. The reason for that is that even if you set DEBUG storm to true via Redis DB it is still handled by LUA script which then notifies orhcagent about the storm: e.g. https://github.com/Azure/sonic-swss/blob/master/orchagent/pfc_detect_mellanox.lua#L58. And as we know all the LUA scripts are vendor specific and started by orchagent based on the platform type. For the VS platform we don't have neither script nor handler in orchagent, so LUA is not started and no one handles DEBUG storm. I also had an idea to use "fake_platform" option in a test in order to run VS as faked real platform so script can be started, but it didn't help as well. I believe it is not simple to do, having PFCWD tests on VS will require more investigation, designing test infra and then implementing some additional changes. Which I think should be taken as a separate task. Please let me know what you think about it.

@volodymyrsamotiy , vs test infra support added in #2077. I believe that should make it possible to add unit tests in this area

volodymyrsamotiy avatar Oct 07 '21 15:10 volodymyrsamotiy

@neethajohn any further comments or we can close this one?

liat-grozovik avatar Nov 08 '21 13:11 liat-grozovik

@neethajohn following an offline discussion regarding tests, we agree that what we have the tests based on the existing infra and if extending the infra is required for pfc wd it should be discussed regardless of this PR. There is a sonic-mgmt test to cover this issue. As of that, could you please review and let us know if there are any additional outstanding issues that this PR should address?

liat-grozovik avatar Dec 14 '21 08:12 liat-grozovik

This pull request fixes 1 alert when merging 1e001f57eaf31c9630095c3d17d5efc35fbec3a1 into a4c80c3d8713d32b73d9428683b8670b46d65fcd - view on LGTM.com

fixed alerts:

  • 1 for Unused import

lgtm-com[bot] avatar Dec 20 '21 16:12 lgtm-com[bot]

This pull request fixes 1 alert when merging d710235a3a8f51291bc4c1726f3b421ed0ac3a86 into a4c80c3d8713d32b73d9428683b8670b46d65fcd - view on LGTM.com

fixed alerts:

  • 1 for Unused import

lgtm-com[bot] avatar Dec 22 '21 12:12 lgtm-com[bot]

/azpw run

volodymyrsamotiy avatar Dec 24 '21 09:12 volodymyrsamotiy

/AzurePipelines run

mssonicbld avatar Dec 24 '21 09:12 mssonicbld

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Dec 24 '21 09:12 azure-pipelines[bot]