frr
frr copied to clipboard
bgpd received signal SIGABRT in 8.3.1
Describe the bug
- [x] Did you check if this is a duplicate issue?
- [ ] Did you test it on the latest FRRouting/frr master branch?
To Reproduce
The error happened during the VyOS 1.4 integration tests while switching the config from
router bgp 64512
no bgp ebgp-requires-policy
no bgp default ipv4-unicast
no bgp network import-check
!
address-family ipv6 unicast
network 2001:db8:100::/48
network 2001:db8:200::/48
network 2001:db8:300::/48
aggregate-address 2001:db8:300::/48 summary-only
redistribute kernel
redistribute connected
redistribute static
redistribute ripng
redistribute ospf6
exit-address-family
exit
to
router bgp 64512
no bgp ebgp-requires-policy
no bgp default ipv4-unicast
no bgp network import-check
!
address-family ipv4 unicast
network 10.0.0.0/8
network 100.64.0.0/10
network 192.168.0.0/16
aggregate-address 10.0.0.0/8 as-set
aggregate-address 100.64.0.0/10 as-set
aggregate-address 192.168.0.0/16 summary-only
redistribute kernel
redistribute connected
redistribute static
redistribute rip
redistribute ospf
redistribute isis
exit-address-family
exit
Unfortunately I have no additional information how to explicitly trigger it via vtysh
Expected behavior
Screenshots
(gdb) list
79
80 static inline void mt_count_free(struct memtype *mt, void *ptr)
81 {
82 frrtrace(2, frr_libfrr, memfree, mt, ptr);
83
84 assert(mt->n_alloc);
85 atomic_fetch_sub_explicit(&mt->n_alloc, 1, memory_order_relaxed);
86
87 #ifdef HAVE_MALLOC_USABLE_SIZE
88 size_t mallocsz = malloc_usable_size(ptr);
(gdb) bt
#0 0x00007f183532ece1 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f1835318537 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007f1835746429 in _zlog_assert_failed (xref=xref@entry=0x7f18357cf1a0 <_xref.1>, extra=extra@entry=0x0) at lib/zlog.c:678
#3 0x00007f18356f1be0 in mt_count_free (mt=0x556d7b9f1e00 <MTYPE_TIP_ADDR>, ptr=0x556d7d3e5cd0) at lib/memory.c:84
#4 mt_count_free (ptr=0x556d7d3e5cd0, mt=0x556d7b9f1e00 <MTYPE_TIP_ADDR>) at lib/memory.c:80
#5 qfree (mt=0x556d7b9f1e00 <MTYPE_TIP_ADDR>, ptr=0x556d7d3e5cd0) at lib/memory.c:140
#6 0x00007f18356db2d3 in hash_clean (hash=0x556d7ccb5df0, free_func=free_func@entry=0x556d7b7e3490 <bgp_tip_hash_free>) at lib/hash.c:303
#7 0x0000556d7b7e47e4 in bgp_tip_hash_destroy (bgp=bgp@entry=0x556d7d3e7120) at bgpd/bgp_nexthop.c:190
#8 0x0000556d7b86317f in bgp_free (bgp=bgp@entry=0x556d7d3e7120) at bgpd/bgpd.c:3789
#9 0x0000556d7b8660e4 in bgp_unlock (bgp=0x556d7d3e7120) at ./bgpd/bgpd.h:2312
#10 bgp_delete (bgp=bgp@entry=0x556d7d3e7120) at bgpd/bgpd.c:3744
#11 0x0000556d7b82a495 in no_router_bgp (self=<optimized out>, vty=0x556d7cc6e240, argc=<optimized out>, argv=<optimized out>) at bgpd/bgp_vty.c:1566
#12 0x00007f18356c0c2e in cmd_execute_command_real (vline=vline@entry=0x556d7d552210, vty=vty@entry=0x556d7cc6e240, cmd=cmd@entry=0x0, up_level=up_level@entry=0, filter=FILTER_RELAXED) at lib/command.c:990
#13 0x00007f18356c0fbd in cmd_execute_command (vline=vline@entry=0x556d7d552210, vty=vty@entry=0x556d7cc6e240, cmd=cmd@entry=0x0, vtysh=vtysh@entry=0) at lib/command.c:1049
#14 0x00007f18356c1210 in cmd_execute (vty=vty@entry=0x556d7cc6e240, cmd=cmd@entry=0x556d7ccb2330 "no router bgp 64512", matched=matched@entry=0x0, vtysh=vtysh@entry=0) at lib/command.c:1217
#15 0x00007f1835731626 in vty_command (vty=vty@entry=0x556d7cc6e240, buf=0x556d7ccb2330 "no router bgp 64512") at lib/vty.c:483
#16 0x00007f1835731d61 in vty_execute (vty=vty@entry=0x556d7cc6e240) at lib/vty.c:1246
#17 0x00007f1835734d40 in vtysh_read (thread=<optimized out>) at lib/vty.c:2145
#18 0x00007f183572c43d in thread_call (thread=thread@entry=0x7fff91424b10) at lib/thread.c:2002
#19 0x00007f18356e6088 in frr_run (master=0x556d7c541190) at lib/libfrr.c:1198
#20 0x0000556d7b792336 in main (argc=<optimized out>, argv=<optimized out>) at bgpd/bgp_main.c:519
#7 0x0000556d7b7e47e4 in bgp_tip_hash_destroy (bgp=bgp@entry=0x556d7d3e7120) at bgpd/bgp_nexthop.c:190
warning: Source file is more recent than executable.
190 hash_clean(bgp->tip_hash, bgp_tip_hash_free);
(gdb) list
185
186 void bgp_tip_hash_destroy(struct bgp *bgp)
187 {
188 if (bgp->tip_hash == NULL)
189 return;
190 hash_clean(bgp->tip_hash, bgp_tip_hash_free);
191 hash_free(bgp->tip_hash);
192 bgp->tip_hash = NULL;
193 }
Versions
- OS Version: Debian 11 / VyOS 1.4
- Kernel: 5.15.67
- FRR Version: 8.3.1
Additional context
what exactly is issuing the no router bgp ... command?
This is issued when the testcase is complete, we always wipe out the bgp config and start with a new config. I am able to reproduce the issue by randomly loading our artificial configurations, but it always crashes with a different configuration.
But what I always see is this line: Sep 19 21:06:07 BGP[818]: in thread bgp_conditional_adv_timer scheduled from bgpd/bgp_conditional_adv.c:186 bgp_conditional_adv_timer()
I can't trivially make it crash, but let's try this patch (wait for packages to be built or you can pull the branch) https://github.com/FRRouting/frr/pull/11979.
Btw, does it happen with previous versions too or not?
I will test that patch and report back.
You can find packages here: https://ci1.netdef.org/browse/FRR-PULLREQ2-7492/artifact
I have backported the change to stable/8.3 but I still get the crash.
Sep 21 20:59:20 BGP[925]: Received signal 11 at 1663786760 (si_addr 0x0, PC 0x55ca12be3de2); aborting...
Sep 21 20:59:20 BGP[925]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_backtrace_sigsafe+0x6d) [0x7f2439f50b4d]
Sep 21 20:59:20 BGP[925]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_signal+0xf5) [0x7f2439f50d45]
Sep 21 20:59:20 BGP[925]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(+0xcd6a1) [0x7f2439f7d6a1]
Sep 21 20:59:20 BGP[925]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x13140) [0x7f2439d41140]
Sep 21 20:59:20 BGP[925]: /usr/lib/frr/bgpd(+0x1fade2) [0x55ca12be3de2]
Sep 21 20:59:20 BGP[925]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(thread_call+0x7d) [0x7f2439f8f43d]
Sep 21 20:59:20 BGP[925]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(frr_run+0xe8) [0x7f2439f49088]
Sep 21 20:59:20 BGP[925]: /usr/lib/frr/bgpd(main+0x356) [0x55ca12acb336]
Sep 21 20:59:20 BGP[925]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea) [0x7f2439b7cd0a]
Sep 21 20:59:20 BGP[925]: /usr/lib/frr/bgpd(_start+0x2a) [0x55ca12acd09a]
Sep 21 20:59:20 BGP[925]: in thread bgp_conditional_adv_timer scheduled from bgpd/bgp_conditional_adv.c:186 bgp_conditional_adv_timer()
Could you somehow check the sequence so I could replicate this on my local machine? It would be much easier to fix this. Because I tried copying the config, restarting, applying your new config, and restarting, but I can't see any crashes. Do you use frr-reload or not?
I will try to find an easy way to replicate it. Another option would be to use the VyOS ISO itself. We are using frr-reload indeed.
@c-po did you have a chance to look at how to replicate it?
Hi @ton31337,
using stable/8.4 I am no longer able to reproduce the issue.