sonic-swss icon indicating copy to clipboard operation
sonic-swss copied to clipboard

[pfcwd] Remove APPL_DB queue in-storm status at pfcwd config removal and big red switch enable

Open wendani opened this issue 3 years ago • 2 comments

What I did APPL_DB tracks in-storm queues for potential warm-reboot. In the following two scenarios, in-storm queues should be removed from APPL_DB.

  1. At pfcwd config removal from a port. When pfcwd config is removed from a port, pfcwd state machine stops running on {port, queue}. In-storm queues of the port should be removed from APPL_DB.

  2. At big red switch enable. Since later at big red switch mode disable, pfcwd state machine on {port, queue} resumes running from operational status, in-storm queues of the port should be removed from APPL_DB when big red switch mode is enabled. Meanwhile, since big red switch mode is tracked in CONFIG_DB, if the system warm-reboots with big red switch mode enabled, no run-time states need to be tracked elsewhere.

This PR amends and verifies the two scenarios described above.

Why I did it

How I verified it

vs tests:

Scenario 1: Piggy-back over test_pfc_en_bits_user_wd_cfg_sep developed in https://github.com/Azure/sonic-swss/pull/1612

Without the change, extension to test pfc_en_bits_user_wd_cfg_sep fails

========================================================================= FAILURES =========================================================================
________________________________________________________ TestPfcWd.test_pfc_en_bits_user_wd_cfg_sep ________________________________________________________

self = <test_pfcwd.TestPfcWd object at 0x7f626c1c4400>, dvs = <conftest.DockerVirtualSwitch object at 0x7f626c1c4cf8>
testlog = <function testlog at 0x7f626c30f0d0>

    def test_pfc_en_bits_user_wd_cfg_sep(self, dvs, testlog):
        self.connect_dbs(dvs)
    
        # Enable pfc wd flex counter polling
        self.enable_flex_counter(CFG_FLEX_COUNTER_TABLE_PFCWD_KEY)
        # Verify pfc wd flex counter status published to FLEX_COUNTER_DB FLEX_COUNTER_GROUP_TABLE by flex counter orch
        fv_dict = {
            FLEX_COUNTER_STATUS: ENABLE,
        }
        self.check_db_fvs(self.flex_cntr_db, FC_FLEX_COUNTER_GROUP_TABLE_NAME, FC_FLEX_COUNTER_GROUP_TABLE_PFC_WD_KEY, fv_dict)
    
        # Enable pfc on tc 3
        pfc_tcs = [QUEUE_3]
        self.set_port_pfc(PORT_UNDER_TEST, pfc_tcs)
    
        # Verify pfc enable bits in ASIC_DB
        port_oid = dvs.asicdb.portnamemap[PORT_UNDER_TEST]
        fv_dict = {
            "SAI_PORT_ATTR_PRIORITY_FLOW_CONTROL": "8",
        }
        self.check_db_fvs(self.asic_db, ASIC_PORT_TABLE_NAME, port_oid, fv_dict)
    
        # Start pfc wd (config) on port
        self.start_port_pfcwd(PORT_UNDER_TEST)
        # Verify port level counter to poll published to FLEX_COUNTER_DB FLEX_COUNTER_TABLE by pfc wd orch
        self.check_db_key_existence(self.flex_cntr_db, FC_FLEX_COUNTER_TABLE_NAME,
                                    "{}:{}".format(FC_FLEX_COUNTER_TABLE_PFC_WD_KEY_PREFIX, port_oid))
        # Verify queue level counter to poll published to FLEX_COUNTER_DB FLEX_COUNTER_TABLE by pfc wd orch
        q3_oid = self.get_queue_oid(dvs, PORT_UNDER_TEST, QUEUE_3)
        self.check_db_key_existence(self.flex_cntr_db, FC_FLEX_COUNTER_TABLE_NAME,
                                    "{}:{}".format(FC_FLEX_COUNTER_TABLE_PFC_WD_KEY_PREFIX, q3_oid))
    
        # Verify pfc enable bits stay unchanged in ASIC_DB
        time.sleep(2)
        fv_dict = {
            "SAI_PORT_ATTR_PRIORITY_FLOW_CONTROL": "8",
        }
        self.check_db_fvs(self.asic_db, ASIC_PORT_TABLE_NAME, port_oid, fv_dict)
    
        # Start pfc storm on queue 3
        self.start_queue_pfc_storm(q3_oid)
        # Verify queue in storm from COUNTERS_DB
        fv_dict = {
            PFC_WD_STATUS: STORMED,
        }
        self.check_db_fvs(self.cntrs_db, CNTR_COUNTERS_TABLE_NAME, q3_oid, fv_dict)
        # Verify queue in storm from APPL_DB
        fv_dict = {
            QUEUE_3: STORM,
        }
        self.check_db_fvs(self.appl_db, APPL_PFC_WD_INSTORM_TABLE_NAME, PORT_UNDER_TEST, fv_dict)
    
        # Verify pfc enable bits change in ASIC_DB
        fv_dict = {
            "SAI_PORT_ATTR_PRIORITY_FLOW_CONTROL": "0",
        }
        self.check_db_fvs(self.asic_db, ASIC_PORT_TABLE_NAME, port_oid, fv_dict)
    
        # Re-set pfc enable on tc 3
        pfc_tcs = [QUEUE_3]
        self.set_port_pfc(PORT_UNDER_TEST, pfc_tcs)
    
        # Verify pfc enable bits stay unchanged in ASIC_DB
        time.sleep(2)
        fv_dict = {
            "SAI_PORT_ATTR_PRIORITY_FLOW_CONTROL": "0",
        }
        self.check_db_fvs(self.asic_db, ASIC_PORT_TABLE_NAME, port_oid, fv_dict)
    
        # Change pfc enable bits: disable pfc on tc 3, and enable pfc on tc 4
        pfc_tcs = [QUEUE_4]
        self.set_port_pfc(PORT_UNDER_TEST, pfc_tcs)
    
        # Verify pfc enable bits change in ASIC_DB
        fv_dict = {
            "SAI_PORT_ATTR_PRIORITY_FLOW_CONTROL": "16",
        }
        self.check_db_fvs(self.asic_db, ASIC_PORT_TABLE_NAME, port_oid, fv_dict)
    
        # Stop pfc wd on port (i.e., remove pfc wd config from port)
        self.stop_port_pfcwd(PORT_UNDER_TEST)
        # Verify port level counter removed from FLEX_COUNTER_DB
        self.check_db_key_removal(self.flex_cntr_db, FC_FLEX_COUNTER_TABLE_NAME,
                                  "{}:{}".format(FC_FLEX_COUNTER_TABLE_PFC_WD_KEY_PREFIX, port_oid))
        # Verify queue level counter removed from FLEX_COUNTER_DB
        self.check_db_key_removal(self.flex_cntr_db, FC_FLEX_COUNTER_TABLE_NAME,
                                  "{}:{}".format(FC_FLEX_COUNTER_TABLE_PFC_WD_KEY_PREFIX, q3_oid))
        q4_oid = self.get_queue_oid(dvs, PORT_UNDER_TEST, QUEUE_4)
        self.check_db_key_removal(self.flex_cntr_db, FC_FLEX_COUNTER_TABLE_NAME,
                                  "{}:{}".format(FC_FLEX_COUNTER_TABLE_PFC_WD_KEY_PREFIX, q4_oid))
        # Verify pfc wd fields removed from COUNTERS_DB
        fields = [PFC_WD_STATUS]
        self.check_db_fields_removal(self.cntrs_db, CNTR_COUNTERS_TABLE_NAME, q3_oid, fields)
        self.check_db_fields_removal(self.cntrs_db, CNTR_COUNTERS_TABLE_NAME, q4_oid, fields)
        # Verify queue in storm status removed from APPL_DB
>       self.check_db_key_removal(self.appl_db, APPL_PFC_WD_INSTORM_TABLE_NAME, PORT_UNDER_TEST)

test_pfcwd.py:300: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
test_pfcwd.py:183: in check_db_key_removal
    db.wait_for_deleted_keys(table_name, [key])
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <dvslib.dvs_database.DVSDatabase object at 0x7f626c1bca90>, table_name = 'PFC_WD_TABLE_INSTORM', deleted_keys = ['Ethernet64']
polling_config = PollingConfig(polling_interval=0.01, timeout=5.0, strict=True), failure_message = None

    def wait_for_deleted_keys(
        self,
        table_name: str,
        deleted_keys: List[str],
        polling_config: PollingConfig = PollingConfig(),
        failure_message: str = None,
    ) -> List[str]:
        """Wait for the specfied keys to no longer exist in the table.
    
        Args:
            table_name: The name of the table from which to fetch the keys.
            deleted_keys: The keys we expect to be removed from the table.
            polling_config: The parameters to use to poll the db.
            failure_message: The message to print if the call times out. This will only take effect
                if the PollingConfig is set to strict.
    
        Returns:
            The keys stored in the table. If no keys are found, then an empty List is returned.
        """
    
        def access_function():
            keys = self.get_keys(table_name)
            return (all(key not in keys for key in deleted_keys), keys)
    
        status, result = wait_for_result(
            access_function, self._disable_strict_polling(polling_config)
        )
    
        if not status:
            expected = [key for key in result if key not in deleted_keys]
            message = failure_message or (
                f"Unexpected keys found: expected={expected}, received={result}, "
                f'table="{table_name}"'
            )
>           assert not polling_config.strict, message
E           AssertionError: Unexpected keys found: expected=[], received=('Ethernet64',), table="PFC_WD_TABLE_INSTORM"

dvslib/dvs_database.py:437: AssertionError
================================================================= short test summary info ==================================================================
FAILED test_pfcwd.py::TestPfcWd::test_pfc_en_bits_user_wd_cfg_sep - AssertionError: Unexpected keys found: expected=[], received=('Ethernet64',), table="...
=============================================================== 1 failed in 69.23s (0:01:09) ===============================================================
Scenario 2: test_appl_db_storm_status_removal_brs
  1. Set PFC enable on {port, TC 3}
  2. Set PFC WD config on port to start PFC WD state machine on {port, TC 3}
  3. Mimic PFC storm on {port queue 3} using DEBUG_STORM
  4. Enable big red switch mode
  5. Dismiss PFC storm on {port, queue 3}
  6. Disable big red switch mode. PFC WD state machine resumes running on {port, queue 3}, and {port, queue 3} starts from and remains in operational status.

Without the change, {port, queue 3} in-storm status entry remains in APPL_DB after step 6.

========================================================================= FAILURES =========================================================================
_____________________________________________________ TestPfcWd.test_appl_db_storm_status_removal_brs ______________________________________________________

self = <test_pfcwd.TestPfcWd object at 0x7f8810369908>, dvs = <conftest.DockerVirtualSwitch object at 0x7f88103a73c8>
testlog = <function testlog at 0x7f88103990d0>

    def test_appl_db_storm_status_removal_brs(self, dvs, testlog):
        self.connect_dbs(dvs)
    
        # Enable pfc wd flex counter polling
        self.enable_flex_counter(CFG_FLEX_COUNTER_TABLE_PFCWD_KEY)
        # Verify pfc wd flex counter status published to FLEX_COUNTER_DB FLEX_COUNTER_GROUP_TABLE by flex counter orch
        fv_dict = {
            FLEX_COUNTER_STATUS: ENABLE,
        }
        self.check_db_fvs(self.flex_cntr_db, FC_FLEX_COUNTER_GROUP_TABLE_NAME, FC_FLEX_COUNTER_GROUP_TABLE_PFC_WD_KEY, fv_dict)
    
        # Enable pfc on tc 3
        pfc_tcs = [QUEUE_3]
        self.set_port_pfc(PORT_UNDER_TEST, pfc_tcs)
        # Verify pfc enable bits in ASIC_DB
        port_oid = dvs.asicdb.portnamemap[PORT_UNDER_TEST]
        fv_dict = {
            "SAI_PORT_ATTR_PRIORITY_FLOW_CONTROL": "8",
        }
        self.check_db_fvs(self.asic_db, ASIC_PORT_TABLE_NAME, port_oid, fv_dict)
    
        # Start pfc wd (config) on port
        self.start_port_pfcwd(PORT_UNDER_TEST)
        # Verify port level counter to poll published to FLEX_COUNTER_DB FLEX_COUNTER_TABLE by pfc wd orch
        self.check_db_key_existence(self.flex_cntr_db, FC_FLEX_COUNTER_TABLE_NAME,
                                    "{}:{}".format(FC_FLEX_COUNTER_TABLE_PFC_WD_KEY_PREFIX, port_oid))
        # Verify queue level counter to poll published to FLEX_COUNTER_DB FLEX_COUNTER_TABLE by pfc wd orch
        q3_oid = self.get_queue_oid(dvs, PORT_UNDER_TEST, QUEUE_3)
        self.check_db_key_existence(self.flex_cntr_db, FC_FLEX_COUNTER_TABLE_NAME,
                                    "{}:{}".format(FC_FLEX_COUNTER_TABLE_PFC_WD_KEY_PREFIX, q3_oid))
    
        # Start pfc storm on queue 3
        self.start_queue_pfc_storm(q3_oid)
        # Verify queue in storm from COUNTERS_DB
        fv_dict = {
            PFC_WD_STATUS: STORMED,
            PFC_WD_QUEUE_STATS_DEADLOCK_DETECTED: "1",
            PFC_WD_QUEUE_STATS_DEADLOCK_RESTORED: "0",
        }
        self.check_db_fvs(self.cntrs_db, CNTR_COUNTERS_TABLE_NAME, q3_oid, fv_dict)
        # Verify queue in storm from APPL_DB
        fv_dict = {
            QUEUE_3: STORM,
        }
        self.check_db_fvs(self.appl_db, APPL_PFC_WD_INSTORM_TABLE_NAME, PORT_UNDER_TEST, fv_dict)
    
        # Enable big red switch
        self.enable_big_red_switch()
        # Verify queue 3 in brs from COUNTERS_DB
        fv_dict = {
            BIG_RED_SWITCH_MODE: ENABLE,
            PFC_WD_STATUS: STORMED,
            PFC_WD_QUEUE_STATS_DEADLOCK_DETECTED: "2",
            PFC_WD_QUEUE_STATS_DEADLOCK_RESTORED: "1",
        }
        self.check_db_fvs(self.cntrs_db, CNTR_COUNTERS_TABLE_NAME, q3_oid, fv_dict)
    
        # Stop pfc storm on queue 3
        self.stop_queue_pfc_storm(q3_oid)
        # Verify DEBUG_STORM field removed from COUNTERS_DB
        fields = [DEBUG_STORM]
        self.check_db_fields_removal(self.cntrs_db, CNTR_COUNTERS_TABLE_NAME, q3_oid, fields)
    
        # Disable big red switch
        self.disable_big_red_switch()
        # Verify brs field removed from COUNTERS_DB
        fields = [BIG_RED_SWITCH_MODE]
        self.check_db_fields_removal(self.cntrs_db, CNTR_COUNTERS_TABLE_NAME, q3_oid, fields)
        # Verify queue operational from COUNTERS_DB
        fv_dict = {
            PFC_WD_STATUS: OPERATIONAL,
            PFC_WD_QUEUE_STATS_DEADLOCK_DETECTED: "2",
            PFC_WD_QUEUE_STATS_DEADLOCK_RESTORED: "2",
        }
        self.check_db_fvs(self.cntrs_db, CNTR_COUNTERS_TABLE_NAME, q3_oid, fv_dict)
    
        # Verify queue in-storm status removed from APPL_DB
>       self.check_db_key_removal(self.appl_db, APPL_PFC_WD_INSTORM_TABLE_NAME, PORT_UNDER_TEST)

test_pfcwd.py:509: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
test_pfcwd.py:185: in check_db_key_removal
    db.wait_for_deleted_keys(table_name, [key])
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <dvslib.dvs_database.DVSDatabase object at 0x7f88102dd5c0>, table_name = 'PFC_WD_TABLE_INSTORM', deleted_keys = ['Ethernet64']
polling_config = PollingConfig(polling_interval=0.01, timeout=5.0, strict=True), failure_message = None

    def wait_for_deleted_keys(
        self,
        table_name: str,
        deleted_keys: List[str],
        polling_config: PollingConfig = PollingConfig(),
        failure_message: str = None,
    ) -> List[str]:
        """Wait for the specfied keys to no longer exist in the table.
    
        Args:
            table_name: The name of the table from which to fetch the keys.
            deleted_keys: The keys we expect to be removed from the table.
            polling_config: The parameters to use to poll the db.
            failure_message: The message to print if the call times out. This will only take effect
                if the PollingConfig is set to strict.
    
        Returns:
            The keys stored in the table. If no keys are found, then an empty List is returned.
        """
    
        def access_function():
            keys = self.get_keys(table_name)
            return (all(key not in keys for key in deleted_keys), keys)
    
        status, result = wait_for_result(
            access_function, self._disable_strict_polling(polling_config)
        )
    
        if not status:
            expected = [key for key in result if key not in deleted_keys]
            message = failure_message or (
                f"Unexpected keys found: expected={expected}, received={result}, "
                f'table="{table_name}"'
            )
>           assert not polling_config.strict, message
E           AssertionError: Unexpected keys found: expected=[], received=('Ethernet64',), table="PFC_WD_TABLE_INSTORM"

dvslib/dvs_database.py:437: AssertionError
================================================================= short test summary info ==================================================================
FAILED test_pfcwd.py::TestPfcWd::test_appl_db_storm_status_removal_brs - AssertionError: Unexpected keys found: expected=[], received=('Ethernet64',), ta...
=============================================================== 1 failed in 65.30s (0:01:05) ===============================================================

Details if related Contains and therefore after https://github.com/Azure/sonic-swss/pull/1612

  • [ ] https://github.com/Azure/sonic-swss/pull/1612

wendani avatar Apr 05 '21 04:04 wendani

This pull request fixes 2 alerts when merging 116daf1da0551715f0714f8d61411799f6d63d0d into 872b5cb9a2a398a086f4646fe134c199919b6c92 - view on LGTM.com

fixed alerts:

  • 2 for Unused import

lgtm-com[bot] avatar Apr 05 '21 05:04 lgtm-com[bot]

This pull request fixes 2 alerts when merging cedb134977a7dd169bacae7e7cc76eee41e4bf62 into 872b5cb9a2a398a086f4646fe134c199919b6c92 - view on LGTM.com

fixed alerts:

  • 2 for Unused import

lgtm-com[bot] avatar Apr 05 '21 21:04 lgtm-com[bot]