zebra: zebra core with v6 RA
Following core/BT was seen in internal code
Program terminated with signal SIGSEGV, Segmentation fault. [Current thread is 1 (Thread 0x7fcd750c9540 (LWP 30999))] (gdb) bt
0 0x00007fcd7596feec in ?? () from /lib/x86_64-linux-gnu/libc.so.6 1 0x00007fcd75920fb2 in raise () from /lib/x86_64-linux-gnu/libc.so.6 2 0x00007fcd75d008dc in core_handler (signo=11, siginfo=0x7ffd92dcb4f0, context=
Paths to crash(Different occurrence): Interface uplink_2 got added to wheel timer 1st time, at end of rtadv_start_interface_events() 1)2025-06-07T05:01:23.802459+00:00 mlx-5600-33 zebra[229165]: [SEY8W-2M6VH] debug rtadv_start_interface_events, loc 2>>>>>::ifp::0x55a3281b1990::uplink_2
About each 1 sec, wheel timer process the interface uplink_2 Log from process_rtadv() 2025-06-07T05:01:29.870749+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2 2025-06-07T05:01:30.870767+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2 2025-06-07T05:01:31.870783+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2 2025-06-07T05:01:32.870794+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2 2025-06-07T05:01:33.870809+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2 2025-06-07T05:01:34.870836+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2
Now 2nd addition to wheel timer for same interface uplink_2 in rtadv_start_interface_events
if (adv_if != NULL) { rtadv_send_packet(zvrf->rtadv.sock, zif->ifp, RA_ENABLE); wheel_add_item(zrouter.ra_wheel, zif->ifp);<<<duplicate gets added return; /* Already added */ }
2)2025-06-07T05:03:44.642871+00:00 mlx-5600-33 zebra[229165]: [G63V5-AKC5D] debug in rtadv_start_interface_events, loc 1 >>>>>::ifp::0x55a3281b1990::uplink_2
Now, about each 1 sec, wheel timer process the interface uplink_2, twice back to back, which proves that indeed there are duplicate entries for uplink_2 in wheel timer Log from process_rtadv() 2025-06-07T05:03:44.878999+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2 2025-06-07T05:03:44.879076+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2 2025-06-07T05:03:45.879096+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2 2025-06-07T05:03:45.879169+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2 2025-06-07T05:03:46.879187+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2 2025-06-07T05:03:46.879240+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2
3)Now suppose the interface iuplink_2 s shutdown/removed, it will remove one instance for the interface from the wheel timer, another will still stay there 4)Interface uplink_2 memory is freed up 5)Now wheel timer tries to process uplink_2, it will crash
ci:rerun
are we sure this is right now? we did have the question about double-adds in an earlier round of this work
True, there was a concern about double add, but no practical way to prove this before. Even this current trigger/behavior is slightly different in upstream frr( especially with network manager restart, we don't get calls to if_up/if_down() in upstream frr, but we get in internal code), and I could not exactly reproduce the same signature in upstream frr, add/delete is getting balanced out with other triggers, but if there is a path to call rtadv_start_interface_events with adv_if != null, it should cause the issue too in frr. Current fix should remove this kind of any known/unknown trigger, that can cause this crash in future.
Also, I was modifying, wheel library before, to provide option, to check if item already exits already, before adding. We decided not to add that code, considering performance issue, for linear list walk in wheel timer.
@Mergifyio backport dev/10.4
backport dev/10.4
✅ Backports have been created
-
#19152 zebra: zebra core with v6 RA (backport #19000) has been created for branch
dev/10.4