sonic-swss
sonic-swss copied to clipboard
[pfcwd] Remove APPL_DB queue in-storm status at pfcwd config removal and big red switch enable
What I did APPL_DB tracks in-storm queues for potential warm-reboot. In the following two scenarios, in-storm queues should be removed from APPL_DB.
-
At pfcwd config removal from a port. When pfcwd config is removed from a port, pfcwd state machine stops running on {port, queue}. In-storm queues of the port should be removed from APPL_DB.
-
At big red switch enable. Since later at big red switch mode disable, pfcwd state machine on {port, queue} resumes running from operational status, in-storm queues of the port should be removed from APPL_DB when big red switch mode is enabled. Meanwhile, since big red switch mode is tracked in CONFIG_DB, if the system warm-reboots with big red switch mode enabled, no run-time states need to be tracked elsewhere.
This PR amends and verifies the two scenarios described above.
Why I did it
How I verified it
vs tests:
Scenario 1: Piggy-back over test_pfc_en_bits_user_wd_cfg_sep
developed in https://github.com/Azure/sonic-swss/pull/1612
Without the change, extension to test pfc_en_bits_user_wd_cfg_sep
fails
========================================================================= FAILURES =========================================================================
________________________________________________________ TestPfcWd.test_pfc_en_bits_user_wd_cfg_sep ________________________________________________________
self = <test_pfcwd.TestPfcWd object at 0x7f626c1c4400>, dvs = <conftest.DockerVirtualSwitch object at 0x7f626c1c4cf8>
testlog = <function testlog at 0x7f626c30f0d0>
def test_pfc_en_bits_user_wd_cfg_sep(self, dvs, testlog):
self.connect_dbs(dvs)
# Enable pfc wd flex counter polling
self.enable_flex_counter(CFG_FLEX_COUNTER_TABLE_PFCWD_KEY)
# Verify pfc wd flex counter status published to FLEX_COUNTER_DB FLEX_COUNTER_GROUP_TABLE by flex counter orch
fv_dict = {
FLEX_COUNTER_STATUS: ENABLE,
}
self.check_db_fvs(self.flex_cntr_db, FC_FLEX_COUNTER_GROUP_TABLE_NAME, FC_FLEX_COUNTER_GROUP_TABLE_PFC_WD_KEY, fv_dict)
# Enable pfc on tc 3
pfc_tcs = [QUEUE_3]
self.set_port_pfc(PORT_UNDER_TEST, pfc_tcs)
# Verify pfc enable bits in ASIC_DB
port_oid = dvs.asicdb.portnamemap[PORT_UNDER_TEST]
fv_dict = {
"SAI_PORT_ATTR_PRIORITY_FLOW_CONTROL": "8",
}
self.check_db_fvs(self.asic_db, ASIC_PORT_TABLE_NAME, port_oid, fv_dict)
# Start pfc wd (config) on port
self.start_port_pfcwd(PORT_UNDER_TEST)
# Verify port level counter to poll published to FLEX_COUNTER_DB FLEX_COUNTER_TABLE by pfc wd orch
self.check_db_key_existence(self.flex_cntr_db, FC_FLEX_COUNTER_TABLE_NAME,
"{}:{}".format(FC_FLEX_COUNTER_TABLE_PFC_WD_KEY_PREFIX, port_oid))
# Verify queue level counter to poll published to FLEX_COUNTER_DB FLEX_COUNTER_TABLE by pfc wd orch
q3_oid = self.get_queue_oid(dvs, PORT_UNDER_TEST, QUEUE_3)
self.check_db_key_existence(self.flex_cntr_db, FC_FLEX_COUNTER_TABLE_NAME,
"{}:{}".format(FC_FLEX_COUNTER_TABLE_PFC_WD_KEY_PREFIX, q3_oid))
# Verify pfc enable bits stay unchanged in ASIC_DB
time.sleep(2)
fv_dict = {
"SAI_PORT_ATTR_PRIORITY_FLOW_CONTROL": "8",
}
self.check_db_fvs(self.asic_db, ASIC_PORT_TABLE_NAME, port_oid, fv_dict)
# Start pfc storm on queue 3
self.start_queue_pfc_storm(q3_oid)
# Verify queue in storm from COUNTERS_DB
fv_dict = {
PFC_WD_STATUS: STORMED,
}
self.check_db_fvs(self.cntrs_db, CNTR_COUNTERS_TABLE_NAME, q3_oid, fv_dict)
# Verify queue in storm from APPL_DB
fv_dict = {
QUEUE_3: STORM,
}
self.check_db_fvs(self.appl_db, APPL_PFC_WD_INSTORM_TABLE_NAME, PORT_UNDER_TEST, fv_dict)
# Verify pfc enable bits change in ASIC_DB
fv_dict = {
"SAI_PORT_ATTR_PRIORITY_FLOW_CONTROL": "0",
}
self.check_db_fvs(self.asic_db, ASIC_PORT_TABLE_NAME, port_oid, fv_dict)
# Re-set pfc enable on tc 3
pfc_tcs = [QUEUE_3]
self.set_port_pfc(PORT_UNDER_TEST, pfc_tcs)
# Verify pfc enable bits stay unchanged in ASIC_DB
time.sleep(2)
fv_dict = {
"SAI_PORT_ATTR_PRIORITY_FLOW_CONTROL": "0",
}
self.check_db_fvs(self.asic_db, ASIC_PORT_TABLE_NAME, port_oid, fv_dict)
# Change pfc enable bits: disable pfc on tc 3, and enable pfc on tc 4
pfc_tcs = [QUEUE_4]
self.set_port_pfc(PORT_UNDER_TEST, pfc_tcs)
# Verify pfc enable bits change in ASIC_DB
fv_dict = {
"SAI_PORT_ATTR_PRIORITY_FLOW_CONTROL": "16",
}
self.check_db_fvs(self.asic_db, ASIC_PORT_TABLE_NAME, port_oid, fv_dict)
# Stop pfc wd on port (i.e., remove pfc wd config from port)
self.stop_port_pfcwd(PORT_UNDER_TEST)
# Verify port level counter removed from FLEX_COUNTER_DB
self.check_db_key_removal(self.flex_cntr_db, FC_FLEX_COUNTER_TABLE_NAME,
"{}:{}".format(FC_FLEX_COUNTER_TABLE_PFC_WD_KEY_PREFIX, port_oid))
# Verify queue level counter removed from FLEX_COUNTER_DB
self.check_db_key_removal(self.flex_cntr_db, FC_FLEX_COUNTER_TABLE_NAME,
"{}:{}".format(FC_FLEX_COUNTER_TABLE_PFC_WD_KEY_PREFIX, q3_oid))
q4_oid = self.get_queue_oid(dvs, PORT_UNDER_TEST, QUEUE_4)
self.check_db_key_removal(self.flex_cntr_db, FC_FLEX_COUNTER_TABLE_NAME,
"{}:{}".format(FC_FLEX_COUNTER_TABLE_PFC_WD_KEY_PREFIX, q4_oid))
# Verify pfc wd fields removed from COUNTERS_DB
fields = [PFC_WD_STATUS]
self.check_db_fields_removal(self.cntrs_db, CNTR_COUNTERS_TABLE_NAME, q3_oid, fields)
self.check_db_fields_removal(self.cntrs_db, CNTR_COUNTERS_TABLE_NAME, q4_oid, fields)
# Verify queue in storm status removed from APPL_DB
> self.check_db_key_removal(self.appl_db, APPL_PFC_WD_INSTORM_TABLE_NAME, PORT_UNDER_TEST)
test_pfcwd.py:300:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test_pfcwd.py:183: in check_db_key_removal
db.wait_for_deleted_keys(table_name, [key])
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <dvslib.dvs_database.DVSDatabase object at 0x7f626c1bca90>, table_name = 'PFC_WD_TABLE_INSTORM', deleted_keys = ['Ethernet64']
polling_config = PollingConfig(polling_interval=0.01, timeout=5.0, strict=True), failure_message = None
def wait_for_deleted_keys(
self,
table_name: str,
deleted_keys: List[str],
polling_config: PollingConfig = PollingConfig(),
failure_message: str = None,
) -> List[str]:
"""Wait for the specfied keys to no longer exist in the table.
Args:
table_name: The name of the table from which to fetch the keys.
deleted_keys: The keys we expect to be removed from the table.
polling_config: The parameters to use to poll the db.
failure_message: The message to print if the call times out. This will only take effect
if the PollingConfig is set to strict.
Returns:
The keys stored in the table. If no keys are found, then an empty List is returned.
"""
def access_function():
keys = self.get_keys(table_name)
return (all(key not in keys for key in deleted_keys), keys)
status, result = wait_for_result(
access_function, self._disable_strict_polling(polling_config)
)
if not status:
expected = [key for key in result if key not in deleted_keys]
message = failure_message or (
f"Unexpected keys found: expected={expected}, received={result}, "
f'table="{table_name}"'
)
> assert not polling_config.strict, message
E AssertionError: Unexpected keys found: expected=[], received=('Ethernet64',), table="PFC_WD_TABLE_INSTORM"
dvslib/dvs_database.py:437: AssertionError
================================================================= short test summary info ==================================================================
FAILED test_pfcwd.py::TestPfcWd::test_pfc_en_bits_user_wd_cfg_sep - AssertionError: Unexpected keys found: expected=[], received=('Ethernet64',), table="...
=============================================================== 1 failed in 69.23s (0:01:09) ===============================================================
Scenario 2: test_appl_db_storm_status_removal_brs
- Set PFC enable on {port, TC 3}
- Set PFC WD config on port to start PFC WD state machine on {port, TC 3}
- Mimic PFC storm on {port queue 3} using DEBUG_STORM
- Enable big red switch mode
- Dismiss PFC storm on {port, queue 3}
- Disable big red switch mode. PFC WD state machine resumes running on {port, queue 3}, and {port, queue 3} starts from and remains in operational status.
Without the change, {port, queue 3} in-storm status entry remains in APPL_DB after step 6.
========================================================================= FAILURES =========================================================================
_____________________________________________________ TestPfcWd.test_appl_db_storm_status_removal_brs ______________________________________________________
self = <test_pfcwd.TestPfcWd object at 0x7f8810369908>, dvs = <conftest.DockerVirtualSwitch object at 0x7f88103a73c8>
testlog = <function testlog at 0x7f88103990d0>
def test_appl_db_storm_status_removal_brs(self, dvs, testlog):
self.connect_dbs(dvs)
# Enable pfc wd flex counter polling
self.enable_flex_counter(CFG_FLEX_COUNTER_TABLE_PFCWD_KEY)
# Verify pfc wd flex counter status published to FLEX_COUNTER_DB FLEX_COUNTER_GROUP_TABLE by flex counter orch
fv_dict = {
FLEX_COUNTER_STATUS: ENABLE,
}
self.check_db_fvs(self.flex_cntr_db, FC_FLEX_COUNTER_GROUP_TABLE_NAME, FC_FLEX_COUNTER_GROUP_TABLE_PFC_WD_KEY, fv_dict)
# Enable pfc on tc 3
pfc_tcs = [QUEUE_3]
self.set_port_pfc(PORT_UNDER_TEST, pfc_tcs)
# Verify pfc enable bits in ASIC_DB
port_oid = dvs.asicdb.portnamemap[PORT_UNDER_TEST]
fv_dict = {
"SAI_PORT_ATTR_PRIORITY_FLOW_CONTROL": "8",
}
self.check_db_fvs(self.asic_db, ASIC_PORT_TABLE_NAME, port_oid, fv_dict)
# Start pfc wd (config) on port
self.start_port_pfcwd(PORT_UNDER_TEST)
# Verify port level counter to poll published to FLEX_COUNTER_DB FLEX_COUNTER_TABLE by pfc wd orch
self.check_db_key_existence(self.flex_cntr_db, FC_FLEX_COUNTER_TABLE_NAME,
"{}:{}".format(FC_FLEX_COUNTER_TABLE_PFC_WD_KEY_PREFIX, port_oid))
# Verify queue level counter to poll published to FLEX_COUNTER_DB FLEX_COUNTER_TABLE by pfc wd orch
q3_oid = self.get_queue_oid(dvs, PORT_UNDER_TEST, QUEUE_3)
self.check_db_key_existence(self.flex_cntr_db, FC_FLEX_COUNTER_TABLE_NAME,
"{}:{}".format(FC_FLEX_COUNTER_TABLE_PFC_WD_KEY_PREFIX, q3_oid))
# Start pfc storm on queue 3
self.start_queue_pfc_storm(q3_oid)
# Verify queue in storm from COUNTERS_DB
fv_dict = {
PFC_WD_STATUS: STORMED,
PFC_WD_QUEUE_STATS_DEADLOCK_DETECTED: "1",
PFC_WD_QUEUE_STATS_DEADLOCK_RESTORED: "0",
}
self.check_db_fvs(self.cntrs_db, CNTR_COUNTERS_TABLE_NAME, q3_oid, fv_dict)
# Verify queue in storm from APPL_DB
fv_dict = {
QUEUE_3: STORM,
}
self.check_db_fvs(self.appl_db, APPL_PFC_WD_INSTORM_TABLE_NAME, PORT_UNDER_TEST, fv_dict)
# Enable big red switch
self.enable_big_red_switch()
# Verify queue 3 in brs from COUNTERS_DB
fv_dict = {
BIG_RED_SWITCH_MODE: ENABLE,
PFC_WD_STATUS: STORMED,
PFC_WD_QUEUE_STATS_DEADLOCK_DETECTED: "2",
PFC_WD_QUEUE_STATS_DEADLOCK_RESTORED: "1",
}
self.check_db_fvs(self.cntrs_db, CNTR_COUNTERS_TABLE_NAME, q3_oid, fv_dict)
# Stop pfc storm on queue 3
self.stop_queue_pfc_storm(q3_oid)
# Verify DEBUG_STORM field removed from COUNTERS_DB
fields = [DEBUG_STORM]
self.check_db_fields_removal(self.cntrs_db, CNTR_COUNTERS_TABLE_NAME, q3_oid, fields)
# Disable big red switch
self.disable_big_red_switch()
# Verify brs field removed from COUNTERS_DB
fields = [BIG_RED_SWITCH_MODE]
self.check_db_fields_removal(self.cntrs_db, CNTR_COUNTERS_TABLE_NAME, q3_oid, fields)
# Verify queue operational from COUNTERS_DB
fv_dict = {
PFC_WD_STATUS: OPERATIONAL,
PFC_WD_QUEUE_STATS_DEADLOCK_DETECTED: "2",
PFC_WD_QUEUE_STATS_DEADLOCK_RESTORED: "2",
}
self.check_db_fvs(self.cntrs_db, CNTR_COUNTERS_TABLE_NAME, q3_oid, fv_dict)
# Verify queue in-storm status removed from APPL_DB
> self.check_db_key_removal(self.appl_db, APPL_PFC_WD_INSTORM_TABLE_NAME, PORT_UNDER_TEST)
test_pfcwd.py:509:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test_pfcwd.py:185: in check_db_key_removal
db.wait_for_deleted_keys(table_name, [key])
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <dvslib.dvs_database.DVSDatabase object at 0x7f88102dd5c0>, table_name = 'PFC_WD_TABLE_INSTORM', deleted_keys = ['Ethernet64']
polling_config = PollingConfig(polling_interval=0.01, timeout=5.0, strict=True), failure_message = None
def wait_for_deleted_keys(
self,
table_name: str,
deleted_keys: List[str],
polling_config: PollingConfig = PollingConfig(),
failure_message: str = None,
) -> List[str]:
"""Wait for the specfied keys to no longer exist in the table.
Args:
table_name: The name of the table from which to fetch the keys.
deleted_keys: The keys we expect to be removed from the table.
polling_config: The parameters to use to poll the db.
failure_message: The message to print if the call times out. This will only take effect
if the PollingConfig is set to strict.
Returns:
The keys stored in the table. If no keys are found, then an empty List is returned.
"""
def access_function():
keys = self.get_keys(table_name)
return (all(key not in keys for key in deleted_keys), keys)
status, result = wait_for_result(
access_function, self._disable_strict_polling(polling_config)
)
if not status:
expected = [key for key in result if key not in deleted_keys]
message = failure_message or (
f"Unexpected keys found: expected={expected}, received={result}, "
f'table="{table_name}"'
)
> assert not polling_config.strict, message
E AssertionError: Unexpected keys found: expected=[], received=('Ethernet64',), table="PFC_WD_TABLE_INSTORM"
dvslib/dvs_database.py:437: AssertionError
================================================================= short test summary info ==================================================================
FAILED test_pfcwd.py::TestPfcWd::test_appl_db_storm_status_removal_brs - AssertionError: Unexpected keys found: expected=[], received=('Ethernet64',), ta...
=============================================================== 1 failed in 65.30s (0:01:05) ===============================================================
Details if related Contains and therefore after https://github.com/Azure/sonic-swss/pull/1612
- [ ] https://github.com/Azure/sonic-swss/pull/1612
This pull request fixes 2 alerts when merging 116daf1da0551715f0714f8d61411799f6d63d0d into 872b5cb9a2a398a086f4646fe134c199919b6c92 - view on LGTM.com
fixed alerts:
- 2 for Unused import
This pull request fixes 2 alerts when merging cedb134977a7dd169bacae7e7cc76eee41e4bf62 into 872b5cb9a2a398a086f4646fe134c199919b6c92 - view on LGTM.com
fixed alerts:
- 2 for Unused import