sonic-buildimage
sonic-buildimage copied to clipboard
[multiasic][supervisor] sonic-db-cli crashes at boot up when execute sonic-db-cli PING command in database.sh on multiasic platform
Description
On supervisor card, sonic-db-cli crashes when executes the sonic-db-cli PING command in the database.sh. The new implementation of the sonci-db-cli with PING command calls initializeGlobalConfig() which will check all ASICs redis#/sonic-db/database_config.json files which are not ready yet. This cause crash and the following error log. This function was used to wait for all database ready. If sonic-db-cli tries to access redis#/sonic-db/database_config.json files, it will failed.
Sep 9 23:21:15 sonic sonic-db-cli: :- parseDatabaseConfig: Sonic database config file doesn't exist at /var/run/redis/sonic-db/../../redis0/sonic-db/database_config.json
Sep 9 23:21:15 sonic database.sh[4739]: terminate called after throwing an instance of 'std::runtime_error'
Sep 9 23:21:15 sonic database.sh[4739]: what(): Sonic database config file syntax error >> Sonic database config file doesn't exist at /var/run/redis/sonic-db/../../redis0/sonic-db/database_config.json
Sep 9 23:21:15 sonic sonic-db-cli: :- initializeGlobalConfig: Sonic database config file syntax error >> Sonic database config file doesn't exist at /var/run/redis/sonic-db/../../redis0/sonic-db/database_config.json
There are 16 ASICs on this supervisor cards. This issue is similar to the isisue https://github.com/sonic-net/sonic-buildimage/issues/10105. If sonic-db-cli behavior is changed, we may need to change waitForAllInstanceDatabaseConfigJsonFilesReady
Steps to reproduce the issue:
- Reboot the the syatem with the new image.
Describe the results you received:
There are core files. and the following error logs
Sep 9 23:21:15 sonic sonic-db-cli: :- parseDatabaseConfig: Sonic database config file doesn't exist at /var/run/redis/sonic-db/../../redis0/sonic-db/database_config.json
Sep 9 23:21:15 sonic database.sh[4739]: terminate called after throwing an instance of 'std::runtime_error'
Sep 9 23:21:15 sonic database.sh[4739]: what(): Sonic database config file syntax error >> Sonic database config file doesn't exist at /var/run/redis/sonic-db/../../redis0/sonic-db/database_config.json
Sep 9 23:21:15 sonic sonic-db-cli: :- initializeGlobalConfig: Sonic database config file syntax error >> Sonic database config file doesn't exist at /var/run/redis/sonic-db/../../redis0/sonic-db/database_config.json
Describe the results you expected:
There should not be any core file and no error log against the sonic-db-cli.
Output of show version
:
(paste your output here)
Output of show techsupport
:
(paste your output here or download and attach the file here )
Additional information you deem important (e.g. issue happens only occasionally):
@qiluo-msft can you please help to check the sonic-db-cli behavior change and see how to fix? looks like scalability issue Thanks.
@SuvarnaMeenakshi - would we please check if multi-asic vs tests would catch this? Thanks.
@abdosi , This is the same as we are observing on 202205 based image.
parseDatabaseConfig
@SuvarnaMeenakshi - would we please check if multi-asic vs tests would catch this? Thanks.
As this error is seen during boot up, multi-asic VS tests suite we have today in PR checker will not be able to flag this. This might be the case for any boot up exception seen in syslog. If there is a reboot test case and post reboot exception seen in syslog will be flagged by log analyzer.
This specific issue is seen only on supervisor and not seen on multi-asic VS or multi-asic LC
Create following PR to fix this issue: https://github.com/sonic-net/sonic-swss-common/pull/701
According to the database.sh code, it will wait until database ready by check sonic-db-cli return value, when database not ready sonic-db-cli should return 1:
https://github.com/sonic-net/sonic-buildimage/blob/master/files/build_templates/docker_image_ctl.j2
until [[ ($(docker exec -i database$DEV pgrep -x -c supervisord) -gt 0) && ($($SONIC_DB_CLI PING | grep -c PONG) -gt 0) &&
($(docker exec -i database$DEV sonic-db-cli PING | grep -c PONG) -gt 0) ]]; do
sleep 1;
done
However, because a code regression in sonic-db-cli, sonic-db-cli will crash.
fix available, please confirm if this can be closed @mlok-nokia
I checked the changes in 202205 branch. It doesn't fix all issues. Although the change avoids the crash occurs and allow the database to load the configuration file, but the core files are still generated.
admin@supervisor:~$ ls /var/core -al total 376 drwxr-xr-x 1 root root 4096 Nov 22 22:00 . drwxr-xr-x 1 root root 4096 Nov 22 20:50 .. -rw-r--r-- 1 root root 88525 Nov 22 21:42 sonic-db-cli.1669153338.6192.core.gz -rw-r--r-- 1 root root 93392 Nov 22 21:42 sonic-db-cli.1669153339.6757.core.gz -rw-r--r-- 1 root root 93413 Nov 22 21:42 sonic-db-cli.1669153339.6886.core.gz -rw-r--r-- 1 root root 93284 Nov 22 21:42 sonic-db-cli.1669153339.7072.core.gz
@mlok-nokia, because the PR #13207 merged, could you please confirm we can close this issue and https://github.com/sonic-net/sonic-buildimage/issues/13740?