sonic-swss
sonic-swss copied to clipboard
[counters] Improve performance by polling only configured ports buffer queue/pg counters
commit 6f1199afc3458e526c26b5d7c498fdaf810fbaca
Author: Shlomi Bitton <[email protected]>
Date: Sun Jan 2 16:55:58 2022 +0000
Filter unconfigured ports buffers queue/pg counters configurations on init.
If no buffer configurations available, no counters will be created.
Allow creating/removing counters on runtime if buffer PG/Queue is created or removed.
New UT added to verify new flow.
Signed-off-by: Shlomi Bitton <[email protected]>
Signed-off-by: Nazarii Hnydyn [email protected]
Propagating https://github.com/Azure/sonic-swss/pull/2143 with resolved merge conflicts Depends on: https://github.com/Azure/sonic-utilities/pull/2239
What I did Currently in SONiC all ports queue and pg counters are created by default with the max possible amount of counters. This feature change this behavior to poll only configured counters provided by the config DB BUFFER_PG and BUFFER_QUEUE tables. If no tables are present in the DB, no counters will be created for ports. Filter the unwanted queues/pgs returned by SAI API calls and skip the creation of these queue/pg counters. Also allow creating/removing counters on runtime if buffer PG/Queue is configured or removed.
Why I did it Improve performance by filtering unconfigured queue/pg counters on init.
How I verified it Check after enabling the counters, if configured counters created in Counters DB according to the configurations. Add/Remove buffer PG/Queue configurations and observe the corresponding counters created/removed accordingly. New UT added to verify this flow.
Details if related
- N/A
/azpw run
/AzurePipelines run
Azure Pipelines successfully started running 1 pipeline(s).
/azpw run
/AzurePipelines run
Azure Pipelines successfully started running 1 pipeline(s).
/azpw run
/AzurePipelines run
Azure Pipelines successfully started running 1 pipeline(s).
@nazariig can you look into the test failures?
@nazariig can you look into the test failures?
@yxieca sure. Will do.
@yxieca looks like the tests are unstable because each run is totally different story
/azpw run
/AzurePipelines run
Azure Pipelines successfully started running 1 pipeline(s).
The issue is observed with: p4rt/test_p4rt_acl.py::TestP4RTAcl::test_AclRuleAddWithoutTableDefinitionFails
.
Looks like the test has been executed twice:
2022-07-12T13:47:06.9867130Z p4rt/test_l3.py::TestP4RTL3::test_IPv4RouteWithNexthopAddUpdateDeletePass PASSED [ 98%]
2022-07-12T13:47:20.2662519Z p4rt/test_l3.py::TestP4RTL3::test_IPv6WithWcmpRouteAddUpdateDeletePass PASSED [ 98%]
2022-07-12T13:47:24.5227387Z p4rt/test_l3.py::TestP4RTL3::test_IPv4RouteAddWithInvalidNexthopFail PASSED [ 98%]
2022-07-12T13:47:28.8312874Z p4rt/test_l3.py::TestP4RTL3::test_IPv6RouteAddWithInvalidWcmpFail PASSED [ 98%]
2022-07-12T13:47:42.0593122Z p4rt/test_l3.py::TestP4RTL3::test_PruneAndRestoreNextHop PASSED [ 99%]
2022-07-12T13:48:16.0895897Z p4rt/test_l3.py::TestP4RTL3::test_PruneNextHopOnWarmBoot PASSED [ 99%]
2022-07-12T13:48:28.2509772Z p4rt/test_l3.py::TestP4RTL3::test_CreateWcmpMemberForOperUpWatchportOnly PASSED [ 99%]
2022-07-12T13:48:39.3870004Z p4rt/test_l3.py::TestP4RTL3::test_RemovePrunedWcmpGroupMember PASSED [ 99%]
2022-07-12T13:48:39.3884297Z p4rt/test_p4rt_acl.py::TestP4RTAcl::test_AclRulesAddUpdateDelPass SKIPPED [ 99%]
2022-07-12T13:49:18.4729702Z p4rt/test_p4rt_acl.py::TestP4RTAcl::test_AclRuleAddWithoutTableDefinitionFails PASSED [ 99%]
2022-07-12T13:49:18.4743042Z p4rt/test_p4rt_acl.py::TestP4RTAcl::test_AclRuleAddWithoutTableDefinitionFails ERROR [ 99%]
2022-07-12T13:50:00.5598863Z p4rt/test_p4rt_mirror.py::TestP4RTMirror::test_MirrorSessionAddModifyAndDelete PASSED [100%]
2022-07-12T13:50:00.5599324Z
2022-07-12T13:50:00.5599755Z ==================================== ERRORS ====================================
2022-07-12T13:50:00.5600312Z _ ERROR at teardown of TestP4RTAcl.test_AclRuleAddWithoutTableDefinitionFails __
2022-07-12T13:50:00.5600557Z
2022-07-12T13:50:00.5609259Z tp = <class 'ValueError'>, value = None, tb = None
2022-07-12T13:50:00.5609494Z
2022-07-12T13:50:00.5609895Z def reraise(tp, value, tb=None):
2022-07-12T13:50:00.5610103Z try:
2022-07-12T13:50:00.5610252Z if value is None:
2022-07-12T13:50:00.5610401Z value = tp()
2022-07-12T13:50:00.5610606Z if value.__traceback__ is not tb:
2022-07-12T13:50:00.5610798Z raise value.with_traceback(tb)
2022-07-12T13:50:00.5610972Z > raise value
2022-07-12T13:50:00.5611039Z
2022-07-12T13:50:00.5611357Z /usr/lib/python3/dist-packages/six.py:703:
2022-07-12T13:50:00.5611593Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
2022-07-12T13:50:00.5611926Z /usr/lib/python3/dist-packages/six.py:703: in reraise
2022-07-12T13:50:00.5612122Z raise value
2022-07-12T13:50:00.5612408Z /usr/lib/python3/dist-packages/six.py:703: in reraise
2022-07-12T13:50:00.5612598Z raise value
2022-07-12T13:50:00.5612759Z conftest.py:1828: in testlog
2022-07-12T13:50:00.5613091Z dvs.runcmd(f"logger -t pytest === finish test {request.node.nodeid} ===")
2022-07-12T13:50:00.5613320Z conftest.py:599: in runcmd
2022-07-12T13:50:00.5613497Z res = self.ctn.exec_run(cmd)
2022-07-12T13:50:00.5613936Z /usr/local/lib/python3.8/dist-packages/docker/models/containers.py:198: in exec_run
2022-07-12T13:50:00.5614556Z exec_output = self.client.api.exec_start(
2022-07-12T13:50:00.5615014Z /usr/local/lib/python3.8/dist-packages/docker/utils/decorators.py:19: in wrapped
2022-07-12T13:50:00.5615265Z return f(self, resource_id, *args, **kwargs)
2022-07-12T13:50:00.5615739Z /usr/local/lib/python3.8/dist-packages/docker/api/exec_api.py:169: in exec_start
2022-07-12T13:50:00.5616203Z return self._read_from_socket(res, stream, tty=tty, demux=demux)
2022-07-12T13:50:00.5616886Z /usr/local/lib/python3.8/dist-packages/docker/api/client.py:422: in _read_from_socket
2022-07-12T13:50:00.5617313Z return consume_socket_output(gen, demux=demux)
2022-07-12T13:50:00.5617942Z /usr/local/lib/python3.8/dist-packages/docker/utils/socket.py:137: in consume_socket_output
2022-07-12T13:50:00.5618340Z return six.binary_type().join(frames)
2022-07-12T13:50:00.5618932Z /usr/local/lib/python3.8/dist-packages/docker/api/client.py:416: in <genexpr>
2022-07-12T13:50:00.5619382Z gen = (data for (_, data) in gen)
2022-07-12T13:50:00.5620025Z /usr/local/lib/python3.8/dist-packages/docker/utils/socket.py:94: in frames_iter_no_tty
2022-07-12T13:50:00.5620457Z (stream, n) = next_frame_header(socket)
2022-07-12T13:50:00.5621111Z /usr/local/lib/python3.8/dist-packages/docker/utils/socket.py:66: in next_frame_header
2022-07-12T13:50:00.5621524Z data = read_exactly(socket, 8)
2022-07-12T13:50:00.5622094Z /usr/local/lib/python3.8/dist-packages/docker/utils/socket.py:51: in read_exactly
2022-07-12T13:50:00.5622691Z next_data = read(socket, n - len(data))
2022-07-12T13:50:00.5623089Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
2022-07-12T13:50:00.5623301Z
2022-07-12T13:50:00.5623640Z socket = <socket.SocketIO object at 0x7fb093f4a7f0>, n = 8
2022-07-12T13:50:00.5623754Z
2022-07-12T13:50:00.5623905Z def read(socket, n=4096):
2022-07-12T13:50:00.5624084Z """
2022-07-12T13:50:00.5624370Z Reads at most n bytes from socket
2022-07-12T13:50:00.5624656Z """
2022-07-12T13:50:00.5624877Z
2022-07-12T13:50:00.5625166Z recoverable_errors = (errno.EINTR, errno.EDEADLK, errno.EWOULDBLOCK)
2022-07-12T13:50:00.5625354Z
2022-07-12T13:50:00.5625533Z if six.PY3 and not isinstance(socket, NpipeSocket):
2022-07-12T13:50:00.5625736Z > select.select([socket], [], [])
2022-07-12T13:50:00.5625950Z E ValueError: filedescriptor out of range in select()
2022-07-12T13:50:00.5626056Z
2022-07-12T13:50:00.5626382Z /usr/local/lib/python3.8/dist-packages/docker/utils/socket.py:31: ValueError
2022-07-12T13:50:00.5626776Z ---------------------------- Captured stdout setup -----------------------------
2022-07-12T13:50:00.5626990Z remove extra link dummy
2022-07-12T13:50:00.5627324Z ---------------------------- Captured stderr setup -----------------------------
2022-07-12T13:50:00.5627596Z Exception ignored in: <function ApplDbValidator.__del__ at 0x7fb09542b820>
2022-07-12T13:50:00.5627829Z Traceback (most recent call last):
2022-07-12T13:50:00.5628044Z File "/agent/_work/1/s/tests/conftest.py", line 164, in __del__
2022-07-12T13:50:00.5628275Z neighbors = self.get_keys(self.NEIGH_TABLE)
2022-07-12T13:50:00.5628507Z File "/agent/_work/1/s/tests/dvslib/dvs_database.py", line 115, in get_keys
2022-07-12T13:50:00.5628762Z table = swsscommon.Table(self.db_connection, table_name)
2022-07-12T13:50:00.5629400Z File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 2335, in __init__
2022-07-12T13:50:00.5629768Z _swsscommon.Table_swiginit(self, _swsscommon.new_Table(*args))
2022-07-12T13:50:00.5630175Z RuntimeError: Unable to connect to redis (unix-socket): Cannot assign requested address
Locally test passes:
root@sonic-vs:~# pytest --imgname=docker-sonic-vs:latest --force-flaky --log-cli-level=info -vvv -k "TestWatermark or TestP4RTL3 or TestP4RTAcl or TestP4RTMirror"
========================================================================= test session starts =========================================================================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-0.13.1 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /root/work/debug/counters/sonic-swss/tests
plugins: flaky-3.7.0
collected 601 items / 587 deselected / 14 selected
test_watermark.py::TestWatermark::test_telemetry_period PASSED [ 7%]
test_watermark.py::TestWatermark::test_lua_plugins SKIPPED (This test is not stable enough) [ 14%]
test_watermark.py::TestWatermark::test_clear SKIPPED (This test is not stable enough) [ 21%]
p4rt/test_l3.py::TestP4RTL3::test_IPv4RouteWithNexthopAddUpdateDeletePass PASSED [ 28%]
p4rt/test_l3.py::TestP4RTL3::test_IPv6WithWcmpRouteAddUpdateDeletePass PASSED [ 35%]
p4rt/test_l3.py::TestP4RTL3::test_IPv4RouteAddWithInvalidNexthopFail PASSED [ 42%]
p4rt/test_l3.py::TestP4RTL3::test_IPv6RouteAddWithInvalidWcmpFail PASSED [ 50%]
p4rt/test_l3.py::TestP4RTL3::test_PruneAndRestoreNextHop PASSED [ 57%]
p4rt/test_l3.py::TestP4RTL3::test_PruneNextHopOnWarmBoot PASSED [ 64%]
p4rt/test_l3.py::TestP4RTL3::test_CreateWcmpMemberForOperUpWatchportOnly PASSED [ 71%]
p4rt/test_l3.py::TestP4RTL3::test_RemovePrunedWcmpGroupMember PASSED [ 78%]
p4rt/test_p4rt_acl.py::TestP4RTAcl::test_AclRulesAddUpdateDelPass SKIPPED (p4orch is not enabled) [ 85%]
p4rt/test_p4rt_acl.py::TestP4RTAcl::test_AclRuleAddWithoutTableDefinitionFails PASSED [ 92%]
p4rt/test_p4rt_mirror.py::TestP4RTMirror::test_MirrorSessionAddModifyAndDelete PASSED [100%]
========================================================================== warnings summary ===========================================================================
...
p4rt/test_l3.py::TestP4RTL3::test_IPv4RouteWithNexthopAddUpdateDeletePass
p4rt/test_p4rt_acl.py::TestP4RTAcl::test_AclRuleAddWithoutTableDefinitionFails
p4rt/test_p4rt_mirror.py::TestP4RTMirror::test_MirrorSessionAddModifyAndDelete
/usr/local/lib/python3.8/dist-packages/_pytest/unraisableexception.py:78: PytestUnraisableExceptionWarning: Exception ignored in: <function ApplDbValidator.__del__ at 0x7fe15c72d040>
Traceback (most recent call last):
File "/root/work/debug/counters/sonic-swss/tests/conftest.py", line 159, in __del__
neighbors = self.get_keys(self.NEIGH_TABLE)
File "/root/work/debug/counters/sonic-swss/tests/dvslib/dvs_database.py", line 115, in get_keys
table = swsscommon.Table(self.db_connection, table_name)
File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 2173, in __init__
_swsscommon.Table_swiginit(self, _swsscommon.new_Table(*args))
RuntimeError: Unable to connect to redis (unix-socket): Cannot assign requested address
warnings.warn(pytest.PytestUnraisableExceptionWarning(msg))
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
===Flaky Test Report===
test_telemetry_period passed 1 out of the required 1 times. Success!
test_IPv4RouteWithNexthopAddUpdateDeletePass passed 1 out of the required 1 times. Success!
test_IPv6WithWcmpRouteAddUpdateDeletePass passed 1 out of the required 1 times. Success!
test_IPv4RouteAddWithInvalidNexthopFail passed 1 out of the required 1 times. Success!
test_IPv6RouteAddWithInvalidWcmpFail passed 1 out of the required 1 times. Success!
test_PruneAndRestoreNextHop passed 1 out of the required 1 times. Success!
test_PruneNextHopOnWarmBoot passed 1 out of the required 1 times. Success!
test_CreateWcmpMemberForOperUpWatchportOnly passed 1 out of the required 1 times. Success!
test_RemovePrunedWcmpGroupMember passed 1 out of the required 1 times. Success!
test_AclRuleAddWithoutTableDefinitionFails passed 1 out of the required 1 times. Success!
test_MirrorSessionAddModifyAndDelete passed 1 out of the required 1 times. Success!
===End Flaky Test Report===
=============================================== 11 passed, 3 skipped, 587 deselected, 47 warnings in 604.22s (0:10:04) ================================================
@qiluo-msft / @yxieca i can't reproduce this issue in my environment. It is definitely not related to the current code changes. Can you please assist here?
@prsunny do you have insight on vstest failures?
@prsunny do you have insight on vstest failures?
It appears to be the same as #2373 / #2350
/usr/local/lib/python3.8/dist-packages/_pytest/unraisableexception.py:78: PytestUnraisableExceptionWarning: Exception ignored in: <function ApplDbValidator.del at 0x7fe15c72d040>
Traceback (most recent call last): File "/root/work/debug/counters/sonic-swss/tests/conftest.py", line 159, in del neighbors = self.get_keys(self.NEIGH_TABLE) File "/root/work/debug/counters/sonic-swss/tests/dvslib/dvs_database.py", line 115, in get_keys table = swsscommon.Table(self.db_connection, table_name) File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 2173, in init _swsscommon.Table_swiginit(self, _swsscommon.new_Table(*args)) RuntimeError: Unable to connect to redis (unix-socket): Cannot assign requested address
@prabhataravind could be triggered by ApplDbValidator.__del__
/azpw run
/AzurePipelines run
Azure Pipelines successfully started running 1 pipeline(s).
/azpw run
/AzurePipelines run
Azure Pipelines successfully started running 1 pipeline(s).
@neethajohn , @prsunny could you please help to review and signoff?
@yxieca can you please help to merge this?
@bingwang-ms can you review this change?
@bingwang-ms just a kind reminder