sonic-buildimage
sonic-buildimage copied to clipboard
Wait till CHASIS_APP_DB PING is successful, host_name and asic_name are valid in CONIFG_DB before starting chassis-db-cleanup
Why I did it
This PR fixes the issue reported in Issu #17945 We noticed that chassis db clean up is skipped sometimes when the CHASSIS_APP_DB PING fails. Also if host_name and asic_name are not written to CONIG_DB, it could pass the empty strings to CHASSIS_APP_DB EVAL commands. The service hostname-config.service is restarted whenever the config-reload or load-minigraph is done and this services renames the file /etc/hosts to updates it with the new file. This interferes with [email protected] and when swss.sh script CHASSIS_APP_DPP when the /etc/hosts file is renamed, the error "Unable to connect to redis: Cannot assign requested address" is seen and the CHASSIS_APP_DB EVAL command fails. This causes the chassis db entries not getting cleaned up and causes orchagent crash in remote LC's.
Work item tracking
- Microsoft ADO (number only):
How I did it
Wait till CHASS_APP_DB PING is successful before checking for entries in CHASSIS_APP_DB table. Also wait till host_name and asic_name are valis in CONFIG_DB. Modified [email protected] to start after hostname-config.service
How to verify it
Ran a script with 200 times config reload & load-minigraph and verified that chassis db cleanup is done every time and the orchagent crash is not seen .
Which release branch to backport (provide reason below if selected)
- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [x ] 202205
- [ ] 202211
- [ ] 202305
Tested branch (Please provide the tested image version)
- [ ]
- [ ]
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)
@judyjoseph for viz
@judyjoseph @arlakshm @abdosi , please review this PR
We ran the complete oc with this fix and the error "Unable to connect to redis: Cannot assign requested address" is not seen. Also we didn't see the orchagent crash
/AzurePipelines run
You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.
/azp run Azure.sonic-buildimage
/azp run Azure.sonic-buildimage
Azure Pipelines successfully started running 1 pipeline(s).
@arlakshm @abdosi for review as well
@saksarav-nokia, Trying to find an alternative solution here as this change to add hostname-config.service dependency will affect all platforms.
I checked this script, can we add a specific check in this script to proceed with changes in /etc/hosts file only if HOSTNAME changes ?: https://github.com/sonic-net/sonic-buildimage/blob/a6a8d198b06d784a018c9a539b7428211974ebc3/files/image_config/hostname/hostname-config.sh#L10.
That should help our case and we need not add this hostname-config.service dependency
@judyjoseph , I think that will also fix the issue. I will test it out and update the PR.
@judyjoseph , Addressed your comments and verified the changes and ensured the issue is not seen with current changes. Please review it.
/azp run Azure.sonic-buildimage
Azure Pipelines successfully started running 1 pipeline(s).
@rlhui @lguohan please help merge this PR
MSFT ADO: 27704026
Cherry-pick PR to 202305: https://github.com/sonic-net/sonic-buildimage/pull/18756
@yxieca , who can review/approve for 202311 for this PR?