sonic-swss On teammgrd/teamsyncd exits, return EXIT

What I did When teammgrd/teamsyncd exits -- return FAILURE so that supervisord catch it and teamd docker is restarted.

Why I did it Fixes https://github.com/Azure/sonic-buildimage/issues/10534

I have seen this in builds from 201911 to master.

How I verified it Checked by sending SIGTERM to teamsyncd/teammgrd processes


Apr 15 21:57:38.156111 str-a7280cr3-2 INFO teamd#supervisord 2022-04-15 21:57:38,155 INFO exited: teamsyncd (exit status 0; expected)

Apr 15 22:20:09.530223 str-a7280cr3-2 INFO teamd#supervisord 2022-04-15 22:20:09,529 INFO exited: teammgrd (exit status 0; expected)

-- with fix

Apr 15 22:24:39.752008 str-a7280cr3-2 INFO teamd#supervisord 2022-04-15 22:24:39,751 INFO exited: teamsyncd (exit status 1; not expected)
AND teamd docker restarts

Details if related

Apr 15 '22 22:04 judyjoseph

@judyjoseph IMHO, this is confusing. SIGTERM is a regular way to stop a process in Linux and the return code should be 0 if no errors observed

Apr 18 '22 16:04 nazariig

@judyjoseph IMHO, this is confusing. SIGTERM is a regular way to stop a process in Linux and the return code should be 0 if no errors observed

@nazariig this is not pertaining to SIGTERM alone - it is just that I used SIGTERM to validate this fix. For any reason teamsyncd/teammgrd comes out of the SELECT loop and exit, it is good for teamd container to restart. For example if teamsyncd exits siliently, some of the interface events will be missed.

A similar approach of using "exit 1" I see in other orchagent daemons like portsyncd, fpmsyncd etc - so that supervisor sees a not-expected exit and restarts the container.

Apr 27 '22 20:04 judyjoseph

/azp run

Apr 27 '22 20:04 judyjoseph

Azure Pipelines successfully started running 1 pipeline(s).

Apr 27 '22 20:04 azure-pipelines[bot]

@prsunny Can you please help review this PR please? Since it is related to an ADO: https://msazure.visualstudio.com/One/_workitems/edit/13799016.

May 25 '22 07:05 yozhao101

@judyjoseph IMHO, this is confusing. SIGTERM is a regular way to stop a process in Linux and the return code should be 0 if no errors observed

@nazariig this is not pertaining to SIGTERM alone - it is just that I used SIGTERM to validate this fix. For any reason teamsyncd/teammgrd comes out of the SELECT loop and exit, it is good for teamd container to restart. For example if teamsyncd exits siliently, some of the interface events will be missed.

A similar approach of using "exit 1" I see in other orchagent daemons like portsyncd, fpmsyncd etc - so that supervisor sees a not-expected exit and restarts the container.

@judyjoseph what is considered to be expected exit here? How are we going to handle graceful shutdown?

Jul 06 '22 09:07 nazariig

sonic-swss
sonic-swss copied to clipboard

On teammgrd/teamsyncd exits, return EXIT_FAILURE

sonic-swss sonic-swss copied to clipboard

On teammgrd/teamsyncd exits, return EXIT_FAILURE

sonic-swss
sonic-swss copied to clipboard