sonic-swss icon indicating copy to clipboard operation
sonic-swss copied to clipboard

[CoPP] Fix the bug that copp configuration will cause failed logs during warm restart

Open chaoskao opened this issue 3 years ago • 18 comments

What I did Let CoppMgr skip configuration which are already in APPL_DB after warm restart.

Why I did it After warm start, the CoppMgr will set configuration into APPL_DB which already exist, but this behavior is not necessary and it will cause CoPPorch to do a modification event.

How I verified it With this enhancement, the syslog will not appear the error messages about Copp during warm restart.

Details if related

chaoskao avatar Dec 20 '21 01:12 chaoskao

Can you please paste the error messages seen after warmboot here?

dgsudharsan avatar Dec 22 '21 17:12 dgsudharsan

Dec 9 12:38:52.857966 as7726-32x-4 ERR swss#orchagent: :- meta_generic_validation_set: SAI_POLICER_ATTR_METER_TYPE:SAI_ATTR_VALUE_TYPE_INT32 attr is create only and cannot be modified Dec 9 12:38:52.857966 as7726-32x-4 ERR swss#orchagent: :- trapGroupUpdatePolicer: Failed to apply attribute[2].id=0 to policer for trap group:default, error:-5 Dec 9 12:38:52.857966 as7726-32x-4 ERR swss#orchagent: :- doTask: Processing copp task item failed, exiting.

chaoskao avatar Dec 23 '21 00:12 chaoskao

@dgsudharsan , could you please check the latest comment?

prsunny avatar Jan 24 '22 19:01 prsunny

According to existing logic we delete the APP_DB will be cleared during an upgrade which includes warm-upgrade scenario. Without the APP_DB being set the feature wouldn't work during warm upgrades. Please revisit the logic to make sure it works for both warm restart and warm upgrade.

Please check here

https://github.com/Azure/sonic-utilities/blob/d9f3afe5b34ef0f03f4137f607bc73bc625631ed/scripts/db_migrator.py#L166

dgsudharsan avatar Jan 25 '22 03:01 dgsudharsan

Dec 9 12:38:52.857966 as7726-32x-4 ERR swss#orchagent: :- doTask: Processing copp task item failed, exiting.

@chaoskao Can you please explain the entire scenario. I tried with default copp configurations and I don't see any issues. Are you adding additional Copp configurations?

dgsudharsan avatar Jan 25 '22 03:01 dgsudharsan

@dgsudharsan I following the test steps in https://github.com/Azure/sonic-mgmt/blob/master/tests/vrf/test_vrf.py#L1020, enable the swss warm reboot and restart the swss container by CLI command.

The CoPP configuration shows as following { "COPP_GROUP": { "default": { "queue": "0", "meter_type":"packets", "mode":"sr_tcm", "cir":"600", "cbs":"600", "red_action":"drop" } } }

chaoskao avatar Jan 25 '22 04:01 chaoskao

@dgsudharsan I following the test steps in https://github.com/Azure/sonic-mgmt/blob/master/tests/vrf/test_vrf.py#L1020, enable the swss warm reboot and restart the swss container by CLI command.

The CoPP configuration shows as following { "COPP_GROUP": { "default": { "queue": "0", "meter_type":"packets", "mode":"sr_tcm", "cir":"600", "cbs":"600", "red_action":"drop" } } }

I believe the usecase is swss warm-restart. Can you please check if warm-reboot itself is impacted? If swss warm-restart we may have to deal it differently.

dgsudharsan avatar Jan 25 '22 05:01 dgsudharsan

@dgsudharsan warm-reboot will not show the error message, the error log only shows in syslog when service swss restart

chaoskao avatar Jan 25 '22 05:01 chaoskao

@dgsudharsan warm-reboot will not show the error message, the error log only shows in syslog when service swss restart

With your changes warmboot and warmupgrade will be broken. A suggestion would be to check if app_db is populated already and ignore setting app db. Another suggestion is to remove app_db entries as in warm reboot @prsunny Can you please suggest your opinion?

dgsudharsan avatar Jan 25 '22 16:01 dgsudharsan

@dgsudharsan warm-reboot will not show the error message, the error log only shows in syslog when service swss restart

With your changes warmboot and warmupgrade will be broken. A suggestion would be to check if app_db is populated already and ignore setting app db. Another suggestion is to remove app_db entries as in warm reboot @prsunny Can you please suggest your opinion?

Do you means check the each attribute of CoPP rules from App_DB, if attribute already exist just ignore this attribute or just check the key of rules?

chaoskao avatar Jan 26 '22 00:01 chaoskao

Do you means check the each attribute of CoPP rules from App_DB, if attribute already exist just ignore this attribute or just check the key of rules?

You can check the key alone within warm restart check that you have introduced. You can use m_coppTable for this purpose.

dgsudharsan avatar Jan 27 '22 00:01 dgsudharsan

@dgsudharsan warm-reboot will not show the error message, the error log only shows in syslog when service swss restart

With your changes warmboot and warmupgrade will be broken. A suggestion would be to check if app_db is populated already and ignore setting app db. Another suggestion is to remove app_db entries as in warm reboot @prsunny Can you please suggest your opinion?

Looks like the change is impacting regular warmboot. As such, we cannot have this change merged. Can you please check on the options provided by @dgsudharsan.

prsunny avatar Jan 27 '22 01:01 prsunny

The modification is just for the warm-restart case which will set a "swss warm restart" flag in STATUS_DB. I want to avoid the case in which the coppmgr will try to set the same configuration from config_DB into APPL_DB which is already existed during warm restart. When user enable "warm_restart" on swss and APPL_DB does not be cleared, this code will skipped the CoPP configuration which from config_DB to APPL_DB

In the db_migratror case mentioned by @dgsudharsan, the enhancement will still apply the configuration from copp_cfg.json into APPL_DB because there is no entry in APPL_DB which is cleared in db_migrator.

chaoskao avatar Jan 28 '22 01:01 chaoskao

The modification is just for the warm-restart case which will set a "swss warm restart" flag in STATUS_DB. I want to avoid the case in which the coppmgr will try to set the same configuration from config_DB into APPL_DB which is already existed during warm restart. When user enable "warm_restart" on swss and APPL_DB does not be cleared, this code will skipped the CoPP configuration which from config_DB to APPL_DB

In the db_migratror case mentioned by @dgsudharsan, the enhancement will still apply the configuration from copp_cfg.json into APPL_DB because there is no entry in APPL_DB which is cleared in db_migrator.

Have you tested this scenario? Can you please perform a warm boot with your fix and check if entries are populated after regular warm boot?

dgsudharsan avatar Feb 26 '22 03:02 dgsudharsan

I try the warm start scenario and the test steps as following: step 1. sudo config warm_restart enable swss step 2. db_migrator.py -o migrate step 3. sudo service swss restart

on step 2, I check APPL_DB has been cleared, and after the step3 the entries of APPL_DB has been populated again

Another warm reboot scenario has same result, the test steps as following:

step 1. sudo config warm_restart enable swss step 2. db_migrator.py -o migrate step 3. sudo warm-reboot

chaoskao avatar Mar 07 '22 02:03 chaoskao

I try the warm start scenario and the test steps as following: step 1. sudo config warm_restart enable swss step 2. db_migrator.py -o migrate step 3. sudo service swss restart

on step 2, I check APPL_DB has been cleared, and after the step3 the entries of APPL_DB has been populated again

Another warm reboot scenario has same result, the test steps as following:

step 1. sudo config warm_restart enable swss step 2. db_migrator.py -o migrate step 3. sudo warm-reboot

If using db_migrator helps should we change sonic-mgmt test script rather than changing code here?

dgsudharsan avatar Apr 02 '22 03:04 dgsudharsan

Hi, @dgsudharsan
Sorry, I don't know what are you mean, I just simulate the scenario that you say the entries will not be populated due to this code change, so I use db_migrator.py to clear CoPP setting from Appl_DB to check does this issue exist or not.

chaoskao avatar Apr 07 '22 07:04 chaoskao

Hi, @dgsudharsan Sorry, I don't know what are you mean, I just simulate the scenario that you say the entries will not be populated due to this code change, so I use db_migrator.py to clear CoPP setting from Appl_DB to check does this issue exist or not.

I believe after using DB migrator you don't see the issue correct? If that's the case we need to use DB migrator logic in the test scripts to avoid the CoPP error logs. @prsunny what is your opinion on this?

dgsudharsan avatar Apr 08 '22 18:04 dgsudharsan