sonic-swss icon indicating copy to clipboard operation
sonic-swss copied to clipboard

[orchdaemon]: Fixed sairedis record file rotation

Open bacrossland opened this issue 2 years ago • 15 comments

What I did Fix Azure/sonic-buildimage#8162

Moved sairedis record file rotation logic out of flush() to fix issue.

Why I did it Sairedis record file was not releasing the file handle on rotation. This is because the file handle release was inside the flush() which was only being called if a select timeout was triggered. Moved the logic to its own function which is called in the start() loop.

How I verified it Ran a script to fill log and verified that rotation was happening correctly.

Signed-off-by: Bryan Crossland [email protected]

bacrossland avatar May 27 '22 19:05 bacrossland

CLA assistant check
All CLA requirements met.

@kcudnik , can you review this?

lguohan avatar May 28 '22 19:05 lguohan

Pipeline fixes are still broken:

@kcudnik @prsunny The errors that are happing on this PR are not related to the code change it makes. The failure are related to problems in the master branch already or problems in the build pipeline.

Path does not exist: /home/vsts/work/1/a/gcov_output

How do you want me to proceed so this PR can be merged?

bacrossland avatar Jun 03 '22 18:06 bacrossland

Please take a look at this PR too: https://github.com/Azure/sonic-sairedis/pull/1058

kcudnik avatar Jun 06 '22 16:06 kcudnik

/azp run

Blueve avatar Jul 12 '22 13:07 Blueve

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Jul 12 '22 13:07 azure-pipelines[bot]

@bacrossland Build has passed. Could you add UT for your change? https://dev.azure.com/mssonic/build/_build/results?buildId=121405&view=codecoverage-tab

Blueve avatar Jul 13 '22 09:07 Blueve

Thank you @Blueve. I will get that unit test added.

bacrossland avatar Jul 13 '22 12:07 bacrossland

/easycla

bacrossland avatar Aug 08 '22 15:08 bacrossland

CLA Signed

The committers listed above are authorized under a signed CLA.

  • :white_check_mark: login: bacrossland / name: Bryan Crossland (01cb5dfa1387dd14794a13de2c94878f4e51ffb2, 80181c49f644db2c9dd195fe99451610140ff5b6)

/azp run coverage.Azure.sonic-swss.amd64

Blueve avatar Aug 17 '22 06:08 Blueve

No pipelines are associated with this pull request.

azure-pipelines[bot] avatar Aug 17 '22 06:08 azure-pipelines[bot]

/azp run

Blueve avatar Aug 17 '22 06:08 Blueve

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Aug 17 '22 06:08 azure-pipelines[bot]

@kcudnik The UT coverage for this repo might not accurate now and most PR are not having UT. Do you think we can merge this PR wo/ UT?

Blueve avatar Aug 17 '22 06:08 Blueve

no, please add corresponding tests to fix code coverage, we dont want to skip this, threshold must be met

kcudnik avatar Aug 22 '22 14:08 kcudnik

@kcudnik I'm setting aside time next week to fix this PR and get the testing in.

bacrossland avatar Sep 16 '22 19:09 bacrossland

sure, lets fix this this week

kcudnik avatar Sep 18 '22 16:09 kcudnik

I've gotten a test to run locally and checked that in for code coverage. This branch is out of date with master. I will be updating that today.

bacrossland avatar Sep 29 '22 19:09 bacrossland

Merged in master.

bacrossland avatar Sep 29 '22 20:09 bacrossland

/azp run

bacrossland avatar Sep 30 '22 13:09 bacrossland

Commenter does not have sufficient privileges for PR 2299 in repo sonic-net/sonic-swss

azure-pipelines[bot] avatar Sep 30 '22 13:09 azure-pipelines[bot]

@kcudnik @Blueve I've added the ut for code coverage. The PR is now failing on files missing from swss-common. I don't have permission to rerun the test. Can you rerun the pipeline to help clear up these issues? Thanks.

bacrossland avatar Sep 30 '22 13:09 bacrossland

/azp run

Blueve avatar Oct 01 '22 08:10 Blueve

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Oct 01 '22 08:10 azure-pipelines[bot]

The LGTM analysis is failing on this. It's coming from swss-common:

[2022-10-01 09:09:02] [build-stderr] In file included from defaultvalueprovider.cpp:10:
[2022-10-01 09:09:02] [build-stderr] defaultvalueprovider.h:8:10: fatal error: libyang/libyang.h: No such file or directory
[2022-10-01 09:09:02] [build-stderr]     8 | #include <libyang/libyang.h>
[2022-10-01 09:09:02] [build-stderr]       |          ^~~~~~~~~~~~~~~~~~~
[2022-10-01 09:09:02] [build-stderr] compilation terminated.
[2022-10-01 09:09:02] [build-stderr] make[3]: *** [Makefile:787: libswsscommon_la-defaultvalueprovider.lo] Error 1
[2022-10-01 09:09:02] [build-stderr] make[3]: *** Waiting for unfinished jobs....
[2022-10-01 09:09:06] [build-stdout] make[3]: Leaving directory '/opt/src/sonic-swss-common/common'
[2022-10-01 09:09:06] [build-stderr] make[2]: *** [Makefile:440: all-recursive] Error 1
[2022-10-01 09:09:06] [build-stdout] make[2]: Leaving directory '/opt/src/sonic-swss-common'
[2022-10-01 09:09:06] [build-stderr] make[1]: *** [Makefile:372: all] Error 2
[2022-10-01 09:09:06] [build-stdout] make[1]: Leaving directory '/opt/src/sonic-swss-common'
[2022-10-01 09:09:06] [build-stderr] dh_auto_build: error: make -j4 returned exit code 2
[2022-10-01 09:09:06] [build-stderr] make: *** [debian/rules:33: build] Error 25
[2022-10-01 09:09:06] [build-stderr] dpkg-buildpackage: error: debian/rules build subprocess returned exit status 2
[2022-10-01 09:09:06] [ERROR] Spawned process exited abnormally (code 2; tried to run: [/opt/work/lgtm-workspace/lgtm/extract.sh])
A fatal error occurred: Exit status 2 from command: [/opt/work/lgtm-workspace/lgtm/extract.sh]

bacrossland avatar Oct 01 '22 13:10 bacrossland

Test vstest is failing on this. Timeout of p2mp_tunnel:

test_p2mp_tunnel failed (1 runs remaining out of 2).
	<class 'AssertionError'>
	Operation timed out after 30 seconds with result None
	[<TracebackEntry /agent/_work/1/s/tests/conftest.py:1803>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:1770>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:405>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:462>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:500>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:105>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:126>, <TracebackEntry /agent/_work/1/s/tests/dvslib/dvs_common.py:60>]
test_p2mp_tunnel failed; it passed 0 out of the required 1 times.
	<class 'AssertionError'>
	Operation timed out after 30 seconds with result None
	[<TracebackEntry /usr/local/lib/python3.8/dist-packages/six.py:718>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:1803>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:1770>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:405>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:462>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:500>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:105>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:126>, <TracebackEntry /agent/_work/1/s/tests/dvslib/dvs_common.py:60>]
test_vlan_extension failed (1 runs remaining out of 2).
	<class 'AssertionError'>
	Operation timed out after 30 seconds with result None
	[<TracebackEntry /usr/local/lib/python3.8/dist-packages/six.py:718>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:1803>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:1770>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:405>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:462>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:500>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:105>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:126>, <TracebackEntry /agent/_work/1/s/tests/dvslib/dvs_common.py:60>]
test_vlan_extension failed; it passed 0 out of the required 1 times.
	<class 'AssertionError'>
	Wrong number of created entries.
	[<TracebackEntry /agent/_work/1/s/tests/test_evpn_tunnel_p2mp.py:65>, <TracebackEntry /agent/_work/1/s/tests/evpn_tunnel.py:524>, <TracebackEntry /agent/_work/1/s/tests/evpn_tunnel.py:50>]

Also here. Assertion failed on removing ipv6 link:

test_NeighborAddRemoveIpv6LinkLocal failed (1 runs remaining out of 2).
	<class 'AssertionError'>
	assert 2 == 4
  -2
  +4
	[<TracebackEntry /agent/_work/1/s/tests/test_ipv6_link_local.py:63>]

bacrossland avatar Oct 01 '22 13:10 bacrossland

Pulled in latest commits that have fixes for missing libyang

bacrossland avatar Oct 01 '22 13:10 bacrossland

@kcudnik @Blueve @prsunny Everything is passing. Ready for a review and a merge.

bacrossland avatar Oct 02 '22 01:10 bacrossland

@bacrossland can you raise a separate PR for 202205 branch? This change cannot be cherry-picked to 202205 branch cleanly.

yxieca avatar Oct 03 '22 19:10 yxieca