sonic-swss icon indicating copy to clipboard operation
sonic-swss copied to clipboard

[counters] Improve performance by polling only configured ports buffer queue/pg counters

Open nazariig opened this issue 2 years ago • 33 comments

commit 6f1199afc3458e526c26b5d7c498fdaf810fbaca
Author: Shlomi Bitton <[email protected]>
Date:   Sun Jan 2 16:55:58 2022 +0000

Filter unconfigured ports buffers queue/pg counters configurations on init.
If no buffer configurations available, no counters will be created.
Allow creating/removing counters on runtime if buffer PG/Queue is created or removed.
New UT added to verify new flow.

Signed-off-by: Shlomi Bitton <[email protected]>

Signed-off-by: Nazarii Hnydyn [email protected]

Propagating https://github.com/Azure/sonic-swss/pull/2143 with resolved merge conflicts Depends on: https://github.com/Azure/sonic-utilities/pull/2239

What I did Currently in SONiC all ports queue and pg counters are created by default with the max possible amount of counters. This feature change this behavior to poll only configured counters provided by the config DB BUFFER_PG and BUFFER_QUEUE tables. If no tables are present in the DB, no counters will be created for ports. Filter the unwanted queues/pgs returned by SAI API calls and skip the creation of these queue/pg counters. Also allow creating/removing counters on runtime if buffer PG/Queue is configured or removed.

Why I did it Improve performance by filtering unconfigured queue/pg counters on init.

How I verified it Check after enabling the counters, if configured counters created in Counters DB according to the configurations. Add/Remove buffer PG/Queue configurations and observe the corresponding counters created/removed accordingly. New UT added to verify this flow.

Details if related

  • N/A

nazariig avatar Jun 28 '22 17:06 nazariig

/azpw run

nazariig avatar Jun 29 '22 16:06 nazariig

/AzurePipelines run

mssonicbld avatar Jun 29 '22 16:06 mssonicbld

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Jun 29 '22 16:06 azure-pipelines[bot]

/azpw run

nazariig avatar Jul 04 '22 14:07 nazariig

/AzurePipelines run

mssonicbld avatar Jul 04 '22 14:07 mssonicbld

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Jul 04 '22 14:07 azure-pipelines[bot]

/azpw run

nazariig avatar Jul 04 '22 19:07 nazariig

/AzurePipelines run

mssonicbld avatar Jul 04 '22 19:07 mssonicbld

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Jul 04 '22 19:07 azure-pipelines[bot]

@nazariig can you look into the test failures?

yxieca avatar Jul 05 '22 16:07 yxieca

@nazariig can you look into the test failures?

@yxieca sure. Will do.

nazariig avatar Jul 08 '22 09:07 nazariig

@yxieca looks like the tests are unstable because each run is totally different story

nazariig avatar Jul 09 '22 17:07 nazariig

/azpw run

nazariig avatar Jul 09 '22 17:07 nazariig

/AzurePipelines run

mssonicbld avatar Jul 09 '22 17:07 mssonicbld

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Jul 09 '22 17:07 azure-pipelines[bot]

The issue is observed with: p4rt/test_p4rt_acl.py::TestP4RTAcl::test_AclRuleAddWithoutTableDefinitionFails.

Looks like the test has been executed twice:

2022-07-12T13:47:06.9867130Z p4rt/test_l3.py::TestP4RTL3::test_IPv4RouteWithNexthopAddUpdateDeletePass PASSED [ 98%]
2022-07-12T13:47:20.2662519Z p4rt/test_l3.py::TestP4RTL3::test_IPv6WithWcmpRouteAddUpdateDeletePass PASSED [ 98%]
2022-07-12T13:47:24.5227387Z p4rt/test_l3.py::TestP4RTL3::test_IPv4RouteAddWithInvalidNexthopFail PASSED [ 98%]
2022-07-12T13:47:28.8312874Z p4rt/test_l3.py::TestP4RTL3::test_IPv6RouteAddWithInvalidWcmpFail PASSED [ 98%]
2022-07-12T13:47:42.0593122Z p4rt/test_l3.py::TestP4RTL3::test_PruneAndRestoreNextHop PASSED          [ 99%]
2022-07-12T13:48:16.0895897Z p4rt/test_l3.py::TestP4RTL3::test_PruneNextHopOnWarmBoot PASSED          [ 99%]
2022-07-12T13:48:28.2509772Z p4rt/test_l3.py::TestP4RTL3::test_CreateWcmpMemberForOperUpWatchportOnly PASSED [ 99%]
2022-07-12T13:48:39.3870004Z p4rt/test_l3.py::TestP4RTL3::test_RemovePrunedWcmpGroupMember PASSED     [ 99%]
2022-07-12T13:48:39.3884297Z p4rt/test_p4rt_acl.py::TestP4RTAcl::test_AclRulesAddUpdateDelPass SKIPPED [ 99%]
2022-07-12T13:49:18.4729702Z p4rt/test_p4rt_acl.py::TestP4RTAcl::test_AclRuleAddWithoutTableDefinitionFails PASSED [ 99%]
2022-07-12T13:49:18.4743042Z p4rt/test_p4rt_acl.py::TestP4RTAcl::test_AclRuleAddWithoutTableDefinitionFails ERROR [ 99%]
2022-07-12T13:50:00.5598863Z p4rt/test_p4rt_mirror.py::TestP4RTMirror::test_MirrorSessionAddModifyAndDelete PASSED [100%]
2022-07-12T13:50:00.5599324Z 
2022-07-12T13:50:00.5599755Z ==================================== ERRORS ====================================
2022-07-12T13:50:00.5600312Z _ ERROR at teardown of TestP4RTAcl.test_AclRuleAddWithoutTableDefinitionFails __
2022-07-12T13:50:00.5600557Z 
2022-07-12T13:50:00.5609259Z tp = <class 'ValueError'>, value = None, tb = None
2022-07-12T13:50:00.5609494Z 
2022-07-12T13:50:00.5609895Z     def reraise(tp, value, tb=None):
2022-07-12T13:50:00.5610103Z         try:
2022-07-12T13:50:00.5610252Z             if value is None:
2022-07-12T13:50:00.5610401Z                 value = tp()
2022-07-12T13:50:00.5610606Z             if value.__traceback__ is not tb:
2022-07-12T13:50:00.5610798Z                 raise value.with_traceback(tb)
2022-07-12T13:50:00.5610972Z >           raise value
2022-07-12T13:50:00.5611039Z 
2022-07-12T13:50:00.5611357Z /usr/lib/python3/dist-packages/six.py:703: 
2022-07-12T13:50:00.5611593Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2022-07-12T13:50:00.5611926Z /usr/lib/python3/dist-packages/six.py:703: in reraise
2022-07-12T13:50:00.5612122Z     raise value
2022-07-12T13:50:00.5612408Z /usr/lib/python3/dist-packages/six.py:703: in reraise
2022-07-12T13:50:00.5612598Z     raise value
2022-07-12T13:50:00.5612759Z conftest.py:1828: in testlog
2022-07-12T13:50:00.5613091Z     dvs.runcmd(f"logger -t pytest === finish test {request.node.nodeid} ===")
2022-07-12T13:50:00.5613320Z conftest.py:599: in runcmd
2022-07-12T13:50:00.5613497Z     res = self.ctn.exec_run(cmd)
2022-07-12T13:50:00.5613936Z /usr/local/lib/python3.8/dist-packages/docker/models/containers.py:198: in exec_run
2022-07-12T13:50:00.5614556Z     exec_output = self.client.api.exec_start(
2022-07-12T13:50:00.5615014Z /usr/local/lib/python3.8/dist-packages/docker/utils/decorators.py:19: in wrapped
2022-07-12T13:50:00.5615265Z     return f(self, resource_id, *args, **kwargs)
2022-07-12T13:50:00.5615739Z /usr/local/lib/python3.8/dist-packages/docker/api/exec_api.py:169: in exec_start
2022-07-12T13:50:00.5616203Z     return self._read_from_socket(res, stream, tty=tty, demux=demux)
2022-07-12T13:50:00.5616886Z /usr/local/lib/python3.8/dist-packages/docker/api/client.py:422: in _read_from_socket
2022-07-12T13:50:00.5617313Z     return consume_socket_output(gen, demux=demux)
2022-07-12T13:50:00.5617942Z /usr/local/lib/python3.8/dist-packages/docker/utils/socket.py:137: in consume_socket_output
2022-07-12T13:50:00.5618340Z     return six.binary_type().join(frames)
2022-07-12T13:50:00.5618932Z /usr/local/lib/python3.8/dist-packages/docker/api/client.py:416: in <genexpr>
2022-07-12T13:50:00.5619382Z     gen = (data for (_, data) in gen)
2022-07-12T13:50:00.5620025Z /usr/local/lib/python3.8/dist-packages/docker/utils/socket.py:94: in frames_iter_no_tty
2022-07-12T13:50:00.5620457Z     (stream, n) = next_frame_header(socket)
2022-07-12T13:50:00.5621111Z /usr/local/lib/python3.8/dist-packages/docker/utils/socket.py:66: in next_frame_header
2022-07-12T13:50:00.5621524Z     data = read_exactly(socket, 8)
2022-07-12T13:50:00.5622094Z /usr/local/lib/python3.8/dist-packages/docker/utils/socket.py:51: in read_exactly
2022-07-12T13:50:00.5622691Z     next_data = read(socket, n - len(data))
2022-07-12T13:50:00.5623089Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2022-07-12T13:50:00.5623301Z 
2022-07-12T13:50:00.5623640Z socket = <socket.SocketIO object at 0x7fb093f4a7f0>, n = 8
2022-07-12T13:50:00.5623754Z 
2022-07-12T13:50:00.5623905Z     def read(socket, n=4096):
2022-07-12T13:50:00.5624084Z         """
2022-07-12T13:50:00.5624370Z         Reads at most n bytes from socket
2022-07-12T13:50:00.5624656Z         """
2022-07-12T13:50:00.5624877Z     
2022-07-12T13:50:00.5625166Z         recoverable_errors = (errno.EINTR, errno.EDEADLK, errno.EWOULDBLOCK)
2022-07-12T13:50:00.5625354Z     
2022-07-12T13:50:00.5625533Z         if six.PY3 and not isinstance(socket, NpipeSocket):
2022-07-12T13:50:00.5625736Z >           select.select([socket], [], [])
2022-07-12T13:50:00.5625950Z E           ValueError: filedescriptor out of range in select()
2022-07-12T13:50:00.5626056Z 
2022-07-12T13:50:00.5626382Z /usr/local/lib/python3.8/dist-packages/docker/utils/socket.py:31: ValueError
2022-07-12T13:50:00.5626776Z ---------------------------- Captured stdout setup -----------------------------
2022-07-12T13:50:00.5626990Z remove extra link dummy
2022-07-12T13:50:00.5627324Z ---------------------------- Captured stderr setup -----------------------------
2022-07-12T13:50:00.5627596Z Exception ignored in: <function ApplDbValidator.__del__ at 0x7fb09542b820>
2022-07-12T13:50:00.5627829Z Traceback (most recent call last):
2022-07-12T13:50:00.5628044Z   File "/agent/_work/1/s/tests/conftest.py", line 164, in __del__
2022-07-12T13:50:00.5628275Z     neighbors = self.get_keys(self.NEIGH_TABLE)
2022-07-12T13:50:00.5628507Z   File "/agent/_work/1/s/tests/dvslib/dvs_database.py", line 115, in get_keys
2022-07-12T13:50:00.5628762Z     table = swsscommon.Table(self.db_connection, table_name)
2022-07-12T13:50:00.5629400Z   File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 2335, in __init__
2022-07-12T13:50:00.5629768Z     _swsscommon.Table_swiginit(self, _swsscommon.new_Table(*args))
2022-07-12T13:50:00.5630175Z RuntimeError: Unable to connect to redis (unix-socket): Cannot assign requested address

Locally test passes:

root@sonic-vs:~# pytest --imgname=docker-sonic-vs:latest --force-flaky --log-cli-level=info -vvv -k "TestWatermark or TestP4RTL3 or TestP4RTAcl or TestP4RTMirror"
========================================================================= test session starts =========================================================================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-0.13.1 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /root/work/debug/counters/sonic-swss/tests
plugins: flaky-3.7.0
collected 601 items / 587 deselected / 14 selected

test_watermark.py::TestWatermark::test_telemetry_period PASSED                                                                                                  [  7%]
test_watermark.py::TestWatermark::test_lua_plugins SKIPPED (This test is not stable enough)                                                                     [ 14%]
test_watermark.py::TestWatermark::test_clear SKIPPED (This test is not stable enough)                                                                           [ 21%]
p4rt/test_l3.py::TestP4RTL3::test_IPv4RouteWithNexthopAddUpdateDeletePass PASSED                                                                                [ 28%]
p4rt/test_l3.py::TestP4RTL3::test_IPv6WithWcmpRouteAddUpdateDeletePass PASSED                                                                                   [ 35%]
p4rt/test_l3.py::TestP4RTL3::test_IPv4RouteAddWithInvalidNexthopFail PASSED                                                                                     [ 42%]
p4rt/test_l3.py::TestP4RTL3::test_IPv6RouteAddWithInvalidWcmpFail PASSED                                                                                        [ 50%]
p4rt/test_l3.py::TestP4RTL3::test_PruneAndRestoreNextHop PASSED                                                                                                 [ 57%]
p4rt/test_l3.py::TestP4RTL3::test_PruneNextHopOnWarmBoot PASSED                                                                                                 [ 64%]
p4rt/test_l3.py::TestP4RTL3::test_CreateWcmpMemberForOperUpWatchportOnly PASSED                                                                                 [ 71%]
p4rt/test_l3.py::TestP4RTL3::test_RemovePrunedWcmpGroupMember PASSED                                                                                            [ 78%]
p4rt/test_p4rt_acl.py::TestP4RTAcl::test_AclRulesAddUpdateDelPass SKIPPED (p4orch is not enabled)                                                               [ 85%]
p4rt/test_p4rt_acl.py::TestP4RTAcl::test_AclRuleAddWithoutTableDefinitionFails PASSED                                                                           [ 92%]
p4rt/test_p4rt_mirror.py::TestP4RTMirror::test_MirrorSessionAddModifyAndDelete PASSED                                                                           [100%]

========================================================================== warnings summary ===========================================================================

...

p4rt/test_l3.py::TestP4RTL3::test_IPv4RouteWithNexthopAddUpdateDeletePass
p4rt/test_p4rt_acl.py::TestP4RTAcl::test_AclRuleAddWithoutTableDefinitionFails
p4rt/test_p4rt_mirror.py::TestP4RTMirror::test_MirrorSessionAddModifyAndDelete
  /usr/local/lib/python3.8/dist-packages/_pytest/unraisableexception.py:78: PytestUnraisableExceptionWarning: Exception ignored in: <function ApplDbValidator.__del__ at 0x7fe15c72d040>

  Traceback (most recent call last):
    File "/root/work/debug/counters/sonic-swss/tests/conftest.py", line 159, in __del__
      neighbors = self.get_keys(self.NEIGH_TABLE)
    File "/root/work/debug/counters/sonic-swss/tests/dvslib/dvs_database.py", line 115, in get_keys
      table = swsscommon.Table(self.db_connection, table_name)
    File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 2173, in __init__
      _swsscommon.Table_swiginit(self, _swsscommon.new_Table(*args))
  RuntimeError: Unable to connect to redis (unix-socket): Cannot assign requested address

    warnings.warn(pytest.PytestUnraisableExceptionWarning(msg))

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
===Flaky Test Report===

test_telemetry_period passed 1 out of the required 1 times. Success!
test_IPv4RouteWithNexthopAddUpdateDeletePass passed 1 out of the required 1 times. Success!
test_IPv6WithWcmpRouteAddUpdateDeletePass passed 1 out of the required 1 times. Success!
test_IPv4RouteAddWithInvalidNexthopFail passed 1 out of the required 1 times. Success!
test_IPv6RouteAddWithInvalidWcmpFail passed 1 out of the required 1 times. Success!
test_PruneAndRestoreNextHop passed 1 out of the required 1 times. Success!
test_PruneNextHopOnWarmBoot passed 1 out of the required 1 times. Success!
test_CreateWcmpMemberForOperUpWatchportOnly passed 1 out of the required 1 times. Success!
test_RemovePrunedWcmpGroupMember passed 1 out of the required 1 times. Success!
test_AclRuleAddWithoutTableDefinitionFails passed 1 out of the required 1 times. Success!
test_MirrorSessionAddModifyAndDelete passed 1 out of the required 1 times. Success!

===End Flaky Test Report===
=============================================== 11 passed, 3 skipped, 587 deselected, 47 warnings in 604.22s (0:10:04) ================================================

nazariig avatar Jul 12 '22 13:07 nazariig

@qiluo-msft / @yxieca i can't reproduce this issue in my environment. It is definitely not related to the current code changes. Can you please assist here?

nazariig avatar Jul 12 '22 16:07 nazariig

@prsunny do you have insight on vstest failures?

qiluo-msft avatar Jul 12 '22 21:07 qiluo-msft

@prsunny do you have insight on vstest failures?

It appears to be the same as #2373 / #2350

prabhataravind avatar Jul 12 '22 22:07 prabhataravind

/usr/local/lib/python3.8/dist-packages/_pytest/unraisableexception.py:78: PytestUnraisableExceptionWarning: Exception ignored in: <function ApplDbValidator.del at 0x7fe15c72d040>

Traceback (most recent call last): File "/root/work/debug/counters/sonic-swss/tests/conftest.py", line 159, in del neighbors = self.get_keys(self.NEIGH_TABLE) File "/root/work/debug/counters/sonic-swss/tests/dvslib/dvs_database.py", line 115, in get_keys table = swsscommon.Table(self.db_connection, table_name) File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 2173, in init _swsscommon.Table_swiginit(self, _swsscommon.new_Table(*args)) RuntimeError: Unable to connect to redis (unix-socket): Cannot assign requested address

@prabhataravind could be triggered by ApplDbValidator.__del__

nazariig avatar Jul 12 '22 22:07 nazariig

/azpw run

nazariig avatar Jul 13 '22 07:07 nazariig

/AzurePipelines run

mssonicbld avatar Jul 13 '22 07:07 mssonicbld

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Jul 13 '22 07:07 azure-pipelines[bot]

/azpw run

nazariig avatar Jul 13 '22 11:07 nazariig

/AzurePipelines run

mssonicbld avatar Jul 13 '22 11:07 mssonicbld

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Jul 13 '22 11:07 azure-pipelines[bot]

@neethajohn , @prsunny could you please help to review and signoff?

liat-grozovik avatar Jul 14 '22 06:07 liat-grozovik

@yxieca can you please help to merge this?

nazariig avatar Jul 28 '22 22:07 nazariig

@bingwang-ms can you review this change?

yxieca avatar Aug 01 '22 15:08 yxieca

@bingwang-ms just a kind reminder

nazariig avatar Aug 03 '22 12:08 nazariig