sonic-buildimage icon indicating copy to clipboard operation
sonic-buildimage copied to clipboard

[master][chassis][mutil-asic] db_migrate.py show error and back trace while loading configuration on Linecard

Open mlok-nokia opened this issue 1 year ago • 4 comments

Description

On master branch, during multi-asic platfom linecard is booting up, db_migrate,py show the following back trace. This could be caused by PR https://github.com/sonic-net/sonic-utilities/pull/3100. database@#.Service dependency "Requires=database.service After=database.service". While db_migrate.py is run on the local database, ASIC instance database0 and database1 have not created the /var/run/redis0/sonic-db/database-config.json yet. It looks like PR https://github.com/sonic-net/sonic-utilities/pull/3100 use "load_db_config()" and cause this backtarce.

Mar 19 04:20:01.443336 ixre-egl-board40 ERR db_migrator: :- parseDatabaseConfig: Sonic database config file doesn't exist at /var/run/redis/sonic-db/../../redis0/sonic-db/database_config.json
Mar 19 04:20:01.443474 ixre-egl-board40 ERR db_migrator: :- initializeGlobalConfig: Sonic database config file syntax error >> Sonic database config file doesn't exist at /var/run/redis/sonic-db/../../redis0/sonic-db/database_config.json
Mar 19 04:20:01.443585 ixre-egl-board40 ERR db_migrator: Caught exception: Sonic database config file syntax error >> Sonic database config file doesn't exist at /var/run/redis/sonic-db/../../redis0/sonic-db/database_config.json
Mar 19 04:20:01.449596 ixre-egl-board40 INFO database.sh[1907]: Traceback (most recent call last):
Mar 19 04:20:01.449806 ixre-egl-board40 INFO database.sh[1907]:   File "/usr/local/bin/db_migrator.py", line 1280, in main
Mar 19 04:20:01.449855 ixre-egl-board40 INFO database.sh[1907]:     load_db_config()
Mar 19 04:20:01.449900 ixre-egl-board40 INFO database.sh[1907]:   File "/usr/local/lib/python3.11/dist-packages/utilities_common/general.py", line 30, in load_db_config
Mar 19 04:20:01.449950 ixre-egl-board40 INFO database.sh[1907]:     swsscommon.SonicDBConfig.load_sonic_global_db_config()
Mar 19 04:20:01.450010 ixre-egl-board40 INFO database.sh[1907]:   File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 1661, in load_sonic_global_db_config
Mar 19 04:20:01.450056 ixre-egl-board40 INFO database.sh[1907]:     SonicDBConfig.initializeGlobalConfig(global_db_file_path)
Mar 19 04:20:01.450097 ixre-egl-board40 INFO database.sh[1907]:   File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 1656, in initializeGlobalConfig
Mar 19 04:20:01.450136 ixre-egl-board40 INFO database.sh[1907]:     return _swsscommon.SonicDBConfig_initializeGlobalConfig(*args, **kwargs)
Mar 19 04:20:01.450182 ixre-egl-board40 INFO database.sh[1907]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 19 04:20:01.450226 ixre-egl-board40 INFO database.sh[1907]: RuntimeError: Sonic database config file syntax error >> Sonic database config file doesn't exist at /var/run/redis/sonic-db/../../redis0/sonic-db/database_config.json
Mar 19 04:20:01.451439 ixre-egl-board40 INFO database.sh[1907]: Sonic database config file syntax error >> Sonic database config file doesn't exist at /var/run/redis/sonic-db/../../redis0/sonic-db/database_config.json
Mar 19 04:20:01.451596 ixre-egl-board40 INFO database.sh[1907]: usage: db_migrator.py [-h] [-o operation migrate, set_version, get_version]
Mar 19 04:20:01.452231 ixre-egl-board40 INFO database.sh[1907]:                       [-s unix socket] [-n asic namespace]
Mar 19 04:20:01.452775 ixre-egl-board40 INFO database.sh[1907]: options:
Mar 19 04:20:01.453413 ixre-egl-board40 INFO database.sh[1907]:   -h, --help            show this help message and exit
Mar 19 04:20:01.453724 ixre-egl-board40 INFO database.sh[1907]:   -o operation (migrate, set_version, get_version)
Mar 19 04:20:01.453773 ixre-egl-board40 INFO database.sh[1907]:                         operation to perform [default: get_version]
Mar 19 04:20:01.453821 ixre-egl-board40 INFO database.sh[1907]:   -s unix socket        the unix socket that the desired database listens on
Mar 19 04:20:01.453864 ixre-egl-board40 INFO database.sh[1907]:   -n asic namespace     The asic namespace whose DB instance we need to
Mar 19 04:20:01.453903 ixre-egl-board40 INFO database.sh[1907]:                         connect
Mar 19 04:20:01.530003 ixre-egl-board40 INFO database.sh[1911]: True

Steps to reproduce the issue:

  1. On a chassis, reboot multi-asic linecard.
  2. the following error be shown in syslog
Mar 19 04:20:01.443336 ixre-egl-board40 ERR db_migrator: :- parseDatabaseConfig: Sonic database config file doesn't exist at /var/run/redis/sonic-db/../../redis0/sonic-db/database_config.json
Mar 19 04:20:01.443474 ixre-egl-board40 ERR db_migrator: :- initializeGlobalConfig: Sonic database config file syntax error >> Sonic database config file doesn't exist at /var/run/redis/sonic-db/../../redis0/sonic-db/database_config.json
Mar 19 04:20:01.443585 ixre-egl-board40 ERR db_migrator: Caught exception: Sonic database config file syntax error >> Sonic database config file doesn't exist at /var/run/redis/sonic-db/../../redis0/sonic-db/database_config.json
Mar 19 04:20:01.449596 ixre-egl-board40 INFO database.sh[1907]: Traceback (most recent call last):
Mar 19 04:20:01.449806 ixre-egl-board40 INFO database.sh[1907]:   File "/usr/local/bin/db_migrator.py", line 1280, in main
Mar 19 04:20:01.449855 ixre-egl-board40 INFO database.sh[1907]:     load_db_config()
Mar 19 04:20:01.449900 ixre-egl-board40 INFO database.sh[1907]:   File "/usr/local/lib/python3.11/dist-packages/utilities_common/general.py", line 30, in load_db_config
Mar 19 04:20:01.449950 ixre-egl-board40 INFO database.sh[1907]:     swsscommon.SonicDBConfig.load_sonic_global_db_config()
Mar 19 04:20:01.450010 ixre-egl-board40 INFO database.sh[1907]:   File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 1661, in load_sonic_global_db_config
Mar 19 04:20:01.450056 ixre-egl-board40 INFO database.sh[1907]:     SonicDBConfig.initializeGlobalConfig(global_db_file_path)
Mar 19 04:20:01.450097 ixre-egl-board40 INFO database.sh[1907]:   File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 1656, in initializeGlobalConfig
Mar 19 04:20:01.450136 ixre-egl-board40 INFO database.sh[1907]:     return _swsscommon.SonicDBConfig_initializeGlobalConfig(*args, **kwargs)
Mar 19 04:20:01.450182 ixre-egl-board40 INFO database.sh[1907]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 19 04:20:01.450226 ixre-egl-board40 INFO database.sh[1907]: RuntimeError: Sonic database config file syntax error >> Sonic database config file doesn't exist at /var/run/redis/sonic-db/../../redis0/sonic-db/database_config.json
Mar 19 04:20:01.451439 ixre-egl-board40 INFO database.sh[1907]: Sonic database config file syntax error >> Sonic database config file doesn't exist at /var/run/redis/sonic-db/../../redis0/sonic-db/database_config.json
Mar 19 04:20:01.451596 ixre-egl-board40 INFO database.sh[1907]: usage: db_migrator.py [-h] [-o operation migrate, set_version, get_version]
Mar 19 04:20:01.452231 ixre-egl-board40 INFO database.sh[1907]:                       [-s unix socket] [-n asic namespace]
Mar 19 04:20:01.452775 ixre-egl-board40 INFO database.sh[1907]: options:
Mar 19 04:20:01.453413 ixre-egl-board40 INFO database.sh[1907]:   -h, --help            show this help message and exit
Mar 19 04:20:01.453724 ixre-egl-board40 INFO database.sh[1907]:   -o operation (migrate, set_version, get_version)
Mar 19 04:20:01.453773 ixre-egl-board40 INFO database.sh[1907]:                         operation to perform [default: get_version]
Mar 19 04:20:01.453821 ixre-egl-board40 INFO database.sh[1907]:   -s unix socket        the unix socket that the desired database listens on
Mar 19 04:20:01.453864 ixre-egl-board40 INFO database.sh[1907]:   -n asic namespace     The asic namespace whose DB instance we need to
Mar 19 04:20:01.453903 ixre-egl-board40 INFO database.sh[1907]:                         connect
Mar 19 04:20:01.530003 ixre-egl-board40 INFO database.sh[1911]: True

Describe the results you received:

Error show on syslog

Mar 19 04:20:01.443336 ixre-egl-board40 ERR db_migrator: :- parseDatabaseConfig: Sonic database config file doesn't exist at /var/run/redis/sonic-db/../../redis0/sonic-db/database_config.json
Mar 19 04:20:01.443474 ixre-egl-board40 ERR db_migrator: :- initializeGlobalConfig: Sonic database config file syntax error >> Sonic database config file doesn't exist at /var/run/redis/sonic-db/../../redis0/sonic-db/database_config.json
Mar 19 04:20:01.443585 ixre-egl-board40 ERR db_migrator: Caught exception: Sonic database config file syntax error >> Sonic database config file doesn't exist at /var/run/redis/sonic-db/../../redis0/sonic-db/database_config.json
Mar 19 04:20:01.449596 ixre-egl-board40 INFO database.sh[1907]: Traceback (most recent call last):
Mar 19 04:20:01.449806 ixre-egl-board40 INFO database.sh[1907]:   File "/usr/local/bin/db_migrator.py", line 1280, in main
Mar 19 04:20:01.449855 ixre-egl-board40 INFO database.sh[1907]:     load_db_config()
Mar 19 04:20:01.449900 ixre-egl-board40 INFO database.sh[1907]:   File "/usr/local/lib/python3.11/dist-packages/utilities_common/general.py", line 30, in load_db_config
Mar 19 04:20:01.449950 ixre-egl-board40 INFO database.sh[1907]:     swsscommon.SonicDBConfig.load_sonic_global_db_config()
Mar 19 04:20:01.450010 ixre-egl-board40 INFO database.sh[1907]:   File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 1661, in load_sonic_global_db_config
Mar 19 04:20:01.450056 ixre-egl-board40 INFO database.sh[1907]:     SonicDBConfig.initializeGlobalConfig(global_db_file_path)
Mar 19 04:20:01.450097 ixre-egl-board40 INFO database.sh[1907]:   File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 1656, in initializeGlobalConfig
Mar 19 04:20:01.450136 ixre-egl-board40 INFO database.sh[1907]:     return _swsscommon.SonicDBConfig_initializeGlobalConfig(*args, **kwargs)
Mar 19 04:20:01.450182 ixre-egl-board40 INFO database.sh[1907]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 19 04:20:01.450226 ixre-egl-board40 INFO database.sh[1907]: RuntimeError: Sonic database config file syntax error >> Sonic database config file doesn't exist at /var/run/redis/sonic-db/../../redis0/sonic-db/database_config.json
Mar 19 04:20:01.451439 ixre-egl-board40 INFO database.sh[1907]: Sonic database config file syntax error >> Sonic database config file doesn't exist at /var/run/redis/sonic-db/../../redis0/sonic-db/database_config.json
Mar 19 04:20:01.451596 ixre-egl-board40 INFO database.sh[1907]: usage: db_migrator.py [-h] [-o operation migrate, set_version, get_version]
Mar 19 04:20:01.452231 ixre-egl-board40 INFO database.sh[1907]:                       [-s unix socket] [-n asic namespace]
Mar 19 04:20:01.452775 ixre-egl-board40 INFO database.sh[1907]: options:
Mar 19 04:20:01.453413 ixre-egl-board40 INFO database.sh[1907]:   -h, --help            show this help message and exit
Mar 19 04:20:01.453724 ixre-egl-board40 INFO database.sh[1907]:   -o operation (migrate, set_version, get_version)
Mar 19 04:20:01.453773 ixre-egl-board40 INFO database.sh[1907]:                         operation to perform [default: get_version]
Mar 19 04:20:01.453821 ixre-egl-board40 INFO database.sh[1907]:   -s unix socket        the unix socket that the desired database listens on
Mar 19 04:20:01.453864 ixre-egl-board40 INFO database.sh[1907]:   -n asic namespace     The asic namespace whose DB instance we need to
Mar 19 04:20:01.453903 ixre-egl-board40 INFO database.sh[1907]:                         connect
Mar 19 04:20:01.530003 ixre-egl-board40 INFO database.sh[1911]: True

Describe the results you expected:

Output of show version:

Master branch

(paste your output here)

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

mlok-nokia avatar Mar 19 '24 04:03 mlok-nokia

Compare of current code and original code:

Current code: if args.namespace is not None: SonicDBConfig.initializeGlobalConfig() else: SonicDBConfig.initialize()

Original code: if is_multi_asic(): if not swsscommon.SonicDBConfig.isGlobalInit(): SonicDBConfig.initializeGlobalConfig() <== crash here else: if not swsscommon.SonicDBConfig.isInit(): SonicDBConfig.initialize()

So, what happen is, on multiasic device, during bootup, the namespace parameter is None, then in old code db_migrator will initialize with local config. in new code, db_migrator will initialize with global config.

Will verify fix on hardware.

liuh-80 avatar Mar 29 '24 02:03 liuh-80

Created fix PR: https://github.com/sonic-net/sonic-utilities/pull/3257

liuh-80 avatar Apr 03 '24 07:04 liuh-80

@mlok-nokia , according to Judy's comments in this PR: https://github.com/sonic-net/sonic-utilities/pull/3257

https://github.com/sonic-net/sonic-utilities/pull/3257#discussion_r1550740054: I feel it is more like a timing issue as you pointed out, Can we call db_migrator, after /var/run/redis0/sonic-db/database-config.json is created ?

Can you check if the issue can fix by call this script after config file generated?

liuh-80 avatar Apr 07 '24 06:04 liuh-80

@mlok-nokia , according to Judy's comments in this PR: sonic-net/sonic-utilities#3257

sonic-net/sonic-utilities#3257 (comment): I feel it is more like a timing issue as you pointed out, Can we call db_migrator, after /var/run/redis0/sonic-db/database-config.json is created ?

Can you check if the issue can fix by call this script after config file generated?

@liuh-80 @judyjoseph all instance [email protected] have dependency "Requires=database.service After=database.service". db_mihgrator script in local host database will not be able to wait for all other instance ready.

There are two options for multi-asic platform:

  1. Change the [email protected] dependency. Since the migrate in database.sh just does the migrate for localhost database. Need modify to migrate for all ASIC instance databases. or
  2. Modify the config-setup script to do the migrate for reboot case since config-setup.service is after all [email protected]. This is very straight forward. Also, the existing database.sh just does the migrate for localhost database, not for ASIC instance databases.

mlok-nokia avatar Apr 16 '24 01:04 mlok-nokia

Fixed

mlok-nokia avatar May 31 '24 14:05 mlok-nokia