sonic-frr icon indicating copy to clipboard operation
sonic-frr copied to clipboard

Zebra process crashes intermittently during 'config reload' on the DUT line cards

Open sanjair-git opened this issue 2 years ago • 4 comments

  • When reporting a crash, provide a backtrace
  • When pasting configs, logs, shell output, backtraces, and other large chunks of text use Markdown code blocks
  • Include the FRR version; if you built from Git, please provide the commit hash
  • Write your issue in English

Describe the bug

On a T2 chassis line card, when we do 'sudo config reload -y', we see 'zebra' process getting crashed and generates a core. We see this issue intermittently happening. (~ approx once in 30 attempts or so)

We have started seeing the issue from this commit,

sonic-buildimage-msft commit: https://github.com/Azure/sonic-buildimage-msft/commit/6f19e12bb24703095ed035b465a9effd22696b50

Following logs are seen on the bgp docker, when the crash is happening.

2023-07-09 13:59:40,064 INFO exited: zebra (terminated by SIGSEGV (core dumped); not expected)
2023-07-11 19:39:22,156 INFO exited: zebra (terminated by SIGSEGV (core dumped); not expected)

Crash logs:

image

Attached the zebra core generated and the frr logs for reference. zebra.1689104360.44.0.core.gz frr.zip

Actual Behaviour:

  • Zebra process under bgp docker gets crashed.
  • Core generated

We had already raised an issue under sonic-buildimage regarding this crash, please take a look at this, https://github.com/sonic-net/sonic-buildimage/issues/15803 15803

To Reproduce Steps to reproduce the behavior: On any T2 chassis line card, do 'sudo config reload -y' for multiple times.

Expected behavior

  • 'sudo config reload' on DUT line cards, should not cause any issue. And the line cards should come up fine with all bgp neighbors established without any crash/core files.

Screenshots If applicable, add screenshots to help explain your problem.

Versions

  • OS Kernel: [e.g. Linux, OpenBSD, etc] [version]
  • FRR Version [version]
admin@ixre-egl-board1:~$ show version

SONiC Software Version: SONiC.HEAD.489499-msft-2205-ndk-d963ac161
SONiC OS Version: 11
Distribution: Debian 11.7
Kernel: 5.10.0-18-2-amd64
Build commit: d963ac161
Build date: Fri Jul  7 18:18:51 UTC 2023
Built by: gitlab-runner@sonic-bld2

Platform: x86_64-nokia_ixr7250e_36x400g-r0
HwSKU: Nokia-IXR7250E-36x100G
ASIC: broadcom
ASIC Count: 2
Serial Number: EAG2-04-210
Model Number: N/A
Hardware Revision: 56
Uptime: 15:45:52 up 1 day, 12:15,  3 users,  load average: 1.56, 1.54, 1.59
Date: Wed 12 Jul 2023 15:45:52

Additional context Add any other context about the problem here.

sanjair-git avatar Jul 24 '23 17:07 sanjair-git

After we checked the previous test history, we found this crash is shown in April testcase run.

mlok-nokia avatar Jul 24 '23 17:07 mlok-nokia

Attaching the symbol file and core file zebra.gz zebra.1689971652.43.1.core.gz

saksarav-nokia avatar Jul 24 '23 19:07 saksarav-nokia

route.zip

saksarav-nokia avatar Jul 27 '23 00:07 saksarav-nokia