frr icon indicating copy to clipboard operation
frr copied to clipboard

bgpd received signal SIGABRT in 8.3.1

Open c-po opened this issue 3 years ago • 6 comments

Describe the bug

  • [x] Did you check if this is a duplicate issue?
  • [ ] Did you test it on the latest FRRouting/frr master branch?

To Reproduce

The error happened during the VyOS 1.4 integration tests while switching the config from

router bgp 64512
 no bgp ebgp-requires-policy
 no bgp default ipv4-unicast
 no bgp network import-check
 !
 address-family ipv6 unicast
  network 2001:db8:100::/48
  network 2001:db8:200::/48
  network 2001:db8:300::/48
  aggregate-address 2001:db8:300::/48 summary-only
  redistribute kernel
  redistribute connected
  redistribute static
  redistribute ripng
  redistribute ospf6
 exit-address-family
exit

to

router bgp 64512
 no bgp ebgp-requires-policy
 no bgp default ipv4-unicast
 no bgp network import-check
 !
 address-family ipv4 unicast
  network 10.0.0.0/8
  network 100.64.0.0/10
  network 192.168.0.0/16
  aggregate-address 10.0.0.0/8 as-set
  aggregate-address 100.64.0.0/10 as-set
  aggregate-address 192.168.0.0/16 summary-only
  redistribute kernel
  redistribute connected
  redistribute static
  redistribute rip
  redistribute ospf
  redistribute isis
 exit-address-family
exit

Unfortunately I have no additional information how to explicitly trigger it via vtysh

Expected behavior

Screenshots

(gdb) list
79
80      static inline void mt_count_free(struct memtype *mt, void *ptr)
81      {
82              frrtrace(2, frr_libfrr, memfree, mt, ptr);
83
84              assert(mt->n_alloc);
85              atomic_fetch_sub_explicit(&mt->n_alloc, 1, memory_order_relaxed);
86
87      #ifdef HAVE_MALLOC_USABLE_SIZE
88              size_t mallocsz = malloc_usable_size(ptr);
(gdb) bt
#0  0x00007f183532ece1 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f1835318537 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f1835746429 in _zlog_assert_failed (xref=xref@entry=0x7f18357cf1a0 <_xref.1>, extra=extra@entry=0x0) at lib/zlog.c:678
#3  0x00007f18356f1be0 in mt_count_free (mt=0x556d7b9f1e00 <MTYPE_TIP_ADDR>, ptr=0x556d7d3e5cd0) at lib/memory.c:84
#4  mt_count_free (ptr=0x556d7d3e5cd0, mt=0x556d7b9f1e00 <MTYPE_TIP_ADDR>) at lib/memory.c:80
#5  qfree (mt=0x556d7b9f1e00 <MTYPE_TIP_ADDR>, ptr=0x556d7d3e5cd0) at lib/memory.c:140
#6  0x00007f18356db2d3 in hash_clean (hash=0x556d7ccb5df0, free_func=free_func@entry=0x556d7b7e3490 <bgp_tip_hash_free>) at lib/hash.c:303
#7  0x0000556d7b7e47e4 in bgp_tip_hash_destroy (bgp=bgp@entry=0x556d7d3e7120) at bgpd/bgp_nexthop.c:190
#8  0x0000556d7b86317f in bgp_free (bgp=bgp@entry=0x556d7d3e7120) at bgpd/bgpd.c:3789
#9  0x0000556d7b8660e4 in bgp_unlock (bgp=0x556d7d3e7120) at ./bgpd/bgpd.h:2312
#10 bgp_delete (bgp=bgp@entry=0x556d7d3e7120) at bgpd/bgpd.c:3744
#11 0x0000556d7b82a495 in no_router_bgp (self=<optimized out>, vty=0x556d7cc6e240, argc=<optimized out>, argv=<optimized out>) at bgpd/bgp_vty.c:1566
#12 0x00007f18356c0c2e in cmd_execute_command_real (vline=vline@entry=0x556d7d552210, vty=vty@entry=0x556d7cc6e240, cmd=cmd@entry=0x0, up_level=up_level@entry=0, filter=FILTER_RELAXED) at lib/command.c:990
#13 0x00007f18356c0fbd in cmd_execute_command (vline=vline@entry=0x556d7d552210, vty=vty@entry=0x556d7cc6e240, cmd=cmd@entry=0x0, vtysh=vtysh@entry=0) at lib/command.c:1049
#14 0x00007f18356c1210 in cmd_execute (vty=vty@entry=0x556d7cc6e240, cmd=cmd@entry=0x556d7ccb2330 "no router bgp 64512", matched=matched@entry=0x0, vtysh=vtysh@entry=0) at lib/command.c:1217
#15 0x00007f1835731626 in vty_command (vty=vty@entry=0x556d7cc6e240, buf=0x556d7ccb2330 "no router bgp 64512") at lib/vty.c:483
#16 0x00007f1835731d61 in vty_execute (vty=vty@entry=0x556d7cc6e240) at lib/vty.c:1246
#17 0x00007f1835734d40 in vtysh_read (thread=<optimized out>) at lib/vty.c:2145
#18 0x00007f183572c43d in thread_call (thread=thread@entry=0x7fff91424b10) at lib/thread.c:2002
#19 0x00007f18356e6088 in frr_run (master=0x556d7c541190) at lib/libfrr.c:1198
#20 0x0000556d7b792336 in main (argc=<optimized out>, argv=<optimized out>) at bgpd/bgp_main.c:519
#7  0x0000556d7b7e47e4 in bgp_tip_hash_destroy (bgp=bgp@entry=0x556d7d3e7120) at bgpd/bgp_nexthop.c:190
warning: Source file is more recent than executable.
190             hash_clean(bgp->tip_hash, bgp_tip_hash_free);
(gdb) list
185
186     void bgp_tip_hash_destroy(struct bgp *bgp)
187     {
188             if (bgp->tip_hash == NULL)
189                     return;
190             hash_clean(bgp->tip_hash, bgp_tip_hash_free);
191             hash_free(bgp->tip_hash);
192             bgp->tip_hash = NULL;
193     }

Versions

  • OS Version: Debian 11 / VyOS 1.4
  • Kernel: 5.15.67
  • FRR Version: 8.3.1

Additional context

c-po avatar Sep 14 '22 19:09 c-po

what exactly is issuing the no router bgp ... command?

donaldsharp avatar Sep 14 '22 20:09 donaldsharp

This is issued when the testcase is complete, we always wipe out the bgp config and start with a new config. I am able to reproduce the issue by randomly loading our artificial configurations, but it always crashes with a different configuration.

But what I always see is this line: Sep 19 21:06:07 BGP[818]: in thread bgp_conditional_adv_timer scheduled from bgpd/bgp_conditional_adv.c:186 bgp_conditional_adv_timer()

c-po avatar Sep 19 '22 19:09 c-po

I can't trivially make it crash, but let's try this patch (wait for packages to be built or you can pull the branch) https://github.com/FRRouting/frr/pull/11979.

Btw, does it happen with previous versions too or not?

ton31337 avatar Sep 20 '22 20:09 ton31337

I will test that patch and report back.

c-po avatar Sep 20 '22 21:09 c-po

You can find packages here: https://ci1.netdef.org/browse/FRR-PULLREQ2-7492/artifact

ton31337 avatar Sep 21 '22 06:09 ton31337

I have backported the change to stable/8.3 but I still get the crash.

Sep 21 20:59:20 BGP[925]: Received signal 11 at 1663786760 (si_addr 0x0, PC 0x55ca12be3de2); aborting...
Sep 21 20:59:20 BGP[925]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_backtrace_sigsafe+0x6d) [0x7f2439f50b4d]
Sep 21 20:59:20 BGP[925]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_signal+0xf5) [0x7f2439f50d45]
Sep 21 20:59:20 BGP[925]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(+0xcd6a1) [0x7f2439f7d6a1]
Sep 21 20:59:20 BGP[925]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x13140) [0x7f2439d41140]
Sep 21 20:59:20 BGP[925]: /usr/lib/frr/bgpd(+0x1fade2) [0x55ca12be3de2]
Sep 21 20:59:20 BGP[925]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(thread_call+0x7d) [0x7f2439f8f43d]
Sep 21 20:59:20 BGP[925]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(frr_run+0xe8) [0x7f2439f49088]
Sep 21 20:59:20 BGP[925]: /usr/lib/frr/bgpd(main+0x356) [0x55ca12acb336]
Sep 21 20:59:20 BGP[925]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea) [0x7f2439b7cd0a]
Sep 21 20:59:20 BGP[925]: /usr/lib/frr/bgpd(_start+0x2a) [0x55ca12acd09a]
Sep 21 20:59:20 BGP[925]: in thread bgp_conditional_adv_timer scheduled from bgpd/bgp_conditional_adv.c:186 bgp_conditional_adv_timer()

c-po avatar Sep 21 '22 19:09 c-po

Could you somehow check the sequence so I could replicate this on my local machine? It would be much easier to fix this. Because I tried copying the config, restarting, applying your new config, and restarting, but I can't see any crashes. Do you use frr-reload or not?

ton31337 avatar Sep 22 '22 19:09 ton31337

I will try to find an easy way to replicate it. Another option would be to use the VyOS ISO itself. We are using frr-reload indeed.

c-po avatar Sep 22 '22 19:09 c-po

@c-po did you have a chance to look at how to replicate it?

ton31337 avatar Nov 25 '22 14:11 ton31337

Hi @ton31337,

using stable/8.4 I am no longer able to reproduce the issue.

c-po avatar Nov 27 '22 16:11 c-po