sonic-buildimage icon indicating copy to clipboard operation
sonic-buildimage copied to clipboard

Wait till CHASIS_APP_DB PING is successful, host_name and asic_name are valid in CONIFG_DB before starting chassis-db-cleanup

Open saksarav-nokia opened this issue 1 year ago • 15 comments

Why I did it

This PR fixes the issue reported in Issu #17945 We noticed that chassis db clean up is skipped sometimes when the CHASSIS_APP_DB PING fails. Also if host_name and asic_name are not written to CONIG_DB, it could pass the empty strings to CHASSIS_APP_DB EVAL commands. The service hostname-config.service is restarted whenever the config-reload or load-minigraph is done and this services renames the file /etc/hosts to updates it with the new file. This interferes with [email protected] and when swss.sh script CHASSIS_APP_DPP when the /etc/hosts file is renamed, the error "Unable to connect to redis: Cannot assign requested address" is seen and the CHASSIS_APP_DB EVAL command fails. This causes the chassis db entries not getting cleaned up and causes orchagent crash in remote LC's.

Work item tracking
  • Microsoft ADO (number only):

How I did it

Wait till CHASS_APP_DB PING is successful before checking for entries in CHASSIS_APP_DB table. Also wait till host_name and asic_name are valis in CONFIG_DB. Modified [email protected] to start after hostname-config.service

How to verify it

Ran a script with 200 times config reload & load-minigraph and verified that chassis db cleanup is done every time and the orchagent crash is not seen .

Which release branch to backport (provide reason below if selected)

  • [ ] 201811
  • [ ] 201911
  • [ ] 202006
  • [ ] 202012
  • [ ] 202106
  • [ ] 202111
  • [x ] 202205
  • [ ] 202211
  • [ ] 202305

Tested branch (Please provide the tested image version)

  • [ ]
  • [ ]

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

saksarav-nokia avatar Jan 31 '24 16:01 saksarav-nokia

@judyjoseph for viz

saksarav-nokia avatar Jan 31 '24 16:01 saksarav-nokia

@judyjoseph @arlakshm @abdosi , please review this PR

saksarav-nokia avatar Feb 08 '24 20:02 saksarav-nokia

We ran the complete oc with this fix and the error "Unable to connect to redis: Cannot assign requested address" is not seen. Also we didn't see the orchagent crash

saksarav-nokia avatar Feb 12 '24 15:02 saksarav-nokia

/AzurePipelines run

judyjoseph avatar Feb 12 '24 17:02 judyjoseph

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

azure-pipelines[bot] avatar Feb 12 '24 17:02 azure-pipelines[bot]

/azp run Azure.sonic-buildimage

judyjoseph avatar Feb 13 '24 23:02 judyjoseph

/azp run Azure.sonic-buildimage

judyjoseph avatar Feb 15 '24 17:02 judyjoseph

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Feb 15 '24 17:02 azure-pipelines[bot]

@arlakshm @abdosi for review as well

judyjoseph avatar Mar 05 '24 04:03 judyjoseph

@saksarav-nokia, Trying to find an alternative solution here as this change to add hostname-config.service dependency will affect all platforms.

I checked this script, can we add a specific check in this script to proceed with changes in /etc/hosts file only if HOSTNAME changes ?: https://github.com/sonic-net/sonic-buildimage/blob/a6a8d198b06d784a018c9a539b7428211974ebc3/files/image_config/hostname/hostname-config.sh#L10.

That should help our case and we need not add this hostname-config.service dependency

judyjoseph avatar Mar 05 '24 22:03 judyjoseph

@judyjoseph , I think that will also fix the issue. I will test it out and update the PR.

saksarav-nokia avatar Mar 06 '24 02:03 saksarav-nokia

@judyjoseph , Addressed your comments and verified the changes and ensured the issue is not seen with current changes. Please review it.

saksarav-nokia avatar Mar 12 '24 14:03 saksarav-nokia

/azp run Azure.sonic-buildimage

judyjoseph avatar Apr 10 '24 04:04 judyjoseph

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Apr 10 '24 04:04 azure-pipelines[bot]

@rlhui @lguohan please help merge this PR

judyjoseph avatar Apr 10 '24 16:04 judyjoseph

MSFT ADO: 27704026

gechiang avatar Apr 17 '24 22:04 gechiang

Cherry-pick PR to 202305: https://github.com/sonic-net/sonic-buildimage/pull/18756

mssonicbld avatar Apr 23 '24 06:04 mssonicbld

@yxieca , who can review/approve for 202311 for this PR?

gechiang avatar Jun 13 '24 03:06 gechiang