frr
frr copied to clipboard
zebra_apic threads not started after FRR service restart
Discussed in https://github.com/FRRouting/frr/discussions/16638
Originally posted by ToshikiRen August 23, 2024 zebra_apic threads not started after FRR service restart (happens after multiple restarts, not all the time, the issue occurrence is mostly random)
The issue is that no routes are sent from routing daemons (e.g., BGP) to the kernel.
Questions:
- From my understanding the zebra_apic are responsible for communication between the FRR daemons and the linux kernel. Is my assumption correct?
- What could be the cause for the zebra_apic to not be started?
I managed to reproduce the issue on a box with the following configuration, without configuring BGP peers:
# default to using syslog. /etc/rsyslog.d/45-frr.conf places the log in
# /var/log/frr/frr.log
#
# Note:
# FRR's configuration shell, vtysh, dynamically edits the live, in-memory
# configuration while FRR is running. When instructed, vtysh will persist the
# live configuration to this file, overwriting its contents. If you want to
# avoid this, you can edit this file manually before starting FRR, or instruct
# vtysh to write configuration to a different file.
log syslog informational
!
debug zebra events
debug zebra packet recv
debug zebra kernel
debug zebra rib
debug zebra nht
debug zebra dplane
debug zebra nexthop
debug zebra neigh
debug bgp nht
debug bgp zebra
The error from the logs when the issue occurs during restart:
bgpd[335825]: [VMFZK-56S5Y] bgp_zebra_label_manager_connect: failed connecting synchronous zclient!
FRR version: 9.1 Show version output:
FRRouting 9.1 (come-as4581) on Linux(6.6.32-dent).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
configured with:
'--build=x86_64-linux' '--host=aarch64-dent-linux' '--target=aarch64-dent-linux' '--prefix=/usr' '--exec_prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--libexecdir=/usr/libexec' '--datadir=/usr/share' '--sysconfdir=/etc' '--sharedstatedir=/com' '--localstatedir=/var' '--libdir=/usr/lib' '--includedir=/usr/include' '--oldincludedir=/usr/include' '--infodir=/usr/share/info' '--mandir=/usr/share/man' '--disable-silent-rules' '--disable-dependency-tracking' '--with-libtool-sysroot=' '--sbindir=/usr/libexec/frr' '--sysconfdir=/etc/frr' '--localstatedir=/var/run/frr' '--enable-vtysh' '--enable-multipath=64' '--enable-user=frr' '--enable-group=frr' '--enable-vty-group=frrvty' '--enable-configfile-mask=0640' '--enable-logfile-mask=0640' '--disable-doc' '--with-clippy=/usr/lib/clippy' '--disable-capabilities' '--disable-cumulus' '--disable-datacenter' '--disable-fpm' '--disable-grpc' '--disable-ospfapi' '--disable-ospfclient' '--with-libpam' '--disable-protobuf' '--disable-snmp' '--disable-zeromq' 'build_alias=x86_64-linux' 'host_alias=aarch64-dent-linux' 'target_alias=aarch64-dent-linux' 'AR=aarch64-dent-linux-gcc-ar' 'LD=aarch64-dent-linux-ld --sysroot= ' 'OBJCOPY=aarch64-dent-linux-objcopy' 'OBJDUMP=aarch64-dent-linux-objdump' 'RANLIB=aarch64-dent-linux-gcc-ranlib' 'STRIP=aarch64-dent-linux-strip' 'PKG_CONFIG_PATH=/usr/lib/pkgconfig:/usr/share/pkgconfig' 'PKG_CONFIG_LIBDIR=/usr/lib/pkgconfig' 'CC=aarch64-dent-linux-gcc -mbranch-protection=standard -fstack-protector-strong -O2 -D_FORTIFY_SOURCE=2 -Wformat -Wformat-security -Werror=format-security --sysroot=' 'CPPFLAGS=' 'CPP=aarch64-dent-linux-gcc -E --sysroot= -mbranch-protection=standard -fstack-protector-strong -O2 -D_FORTIFY_SOURCE=2 -Wformat -Wformat-security -Werror=format-security' 'CXX=aarch64-dent-linux-g++ -mbranch-protection=standard -fstack-protector-strong -O2 -D_FORTIFY_SOURCE=2 -Wformat -Wformat-security -Werror=format-security --sysroot=' 'PYTHON=/usr/bin/python3-native/python3'
When the issue occurs the zebra zserv.api socket is owned by root instead of frr:
# ls -l /var/run/frr/zserv.api
srwx------ 1 root frr 0 Aug 28 06:20 /var/run/frr/zserv.api
Looking into the code it seems the only case for root to own this socket would be to use a TCP connection but it is not the case for our configuration.
I have seen this issue on the latest frr release (10.1) as well.