Zebra process crashes intermittently during 'config reload' on the DUT line cards
- When reporting a crash, provide a backtrace
- When pasting configs, logs, shell output, backtraces, and other large chunks of text use Markdown code blocks
- Include the FRR version; if you built from Git, please provide the commit hash
- Write your issue in English
Describe the bug
On a T2 chassis line card, when we do 'sudo config reload -y', we see 'zebra' process getting crashed and generates a core. We see this issue intermittently happening. (~ approx once in 30 attempts or so)
We have started seeing the issue from this commit,
sonic-buildimage-msft commit: https://github.com/Azure/sonic-buildimage-msft/commit/6f19e12bb24703095ed035b465a9effd22696b50
Following logs are seen on the bgp docker, when the crash is happening.
2023-07-09 13:59:40,064 INFO exited: zebra (terminated by SIGSEGV (core dumped); not expected)
2023-07-11 19:39:22,156 INFO exited: zebra (terminated by SIGSEGV (core dumped); not expected)
Crash logs:
Attached the zebra core generated and the frr logs for reference. zebra.1689104360.44.0.core.gz frr.zip
Actual Behaviour:
- Zebra process under bgp docker gets crashed.
- Core generated
We had already raised an issue under sonic-buildimage regarding this crash, please take a look at this, https://github.com/sonic-net/sonic-buildimage/issues/15803 15803
To Reproduce Steps to reproduce the behavior: On any T2 chassis line card, do 'sudo config reload -y' for multiple times.
Expected behavior
- 'sudo config reload' on DUT line cards, should not cause any issue. And the line cards should come up fine with all bgp neighbors established without any crash/core files.
Screenshots If applicable, add screenshots to help explain your problem.
Versions
- OS Kernel: [e.g. Linux, OpenBSD, etc] [version]
- FRR Version [version]
admin@ixre-egl-board1:~$ show version
SONiC Software Version: SONiC.HEAD.489499-msft-2205-ndk-d963ac161
SONiC OS Version: 11
Distribution: Debian 11.7
Kernel: 5.10.0-18-2-amd64
Build commit: d963ac161
Build date: Fri Jul 7 18:18:51 UTC 2023
Built by: gitlab-runner@sonic-bld2
Platform: x86_64-nokia_ixr7250e_36x400g-r0
HwSKU: Nokia-IXR7250E-36x100G
ASIC: broadcom
ASIC Count: 2
Serial Number: EAG2-04-210
Model Number: N/A
Hardware Revision: 56
Uptime: 15:45:52 up 1 day, 12:15, 3 users, load average: 1.56, 1.54, 1.59
Date: Wed 12 Jul 2023 15:45:52
Additional context Add any other context about the problem here.
After we checked the previous test history, we found this crash is shown in April testcase run.
Attaching the symbol file and core file zebra.gz zebra.1689971652.43.1.core.gz