static routes not removed from kernel
Description
Static routes configured through vty shell not removed from kernel after frr restart
Version
9.1.1
How to reproduce
- configure static route
vtysh
conf t
ip route 100.70.1.254/32 Null0
- check route in kernel
ip r | grep 100.70.1.254
blackhole 100.70.1.254 proto 196 metric 20
- stop frr
sudo docker stop frr
- check route in kernel
ip r | grep 100.70.1.254
blackhole 100.70.1.254 proto 196 metric 20
- start frr
sudo docker start frr
- check route in frr
vtysh
show ip route 100.70.1.254/32
Routing entry for 100.70.1.254/32
Known via "static", distance 1, metric 0, best
Last update 00:07:13 ago
* unreachable (blackhole)
- try to delete static route from frr
vtysh
conf t
no ip route 100.70.1.254/32 Null0
% Refusing to remove a non-existent route
ip route 100.70.1.254/32 Null0
ERROR: SET_CONFIG request failed, Error: Only inactive VRFs can be deleted
Expected behavior
static routes should be deleted from kernel
Actual behavior
static routes still in kernel even frr is stopped
Additional context
error in logs
2024/08/27 14:04:36 STATIC: [MHYBZ-5A04C][EC 100663334] error processing configuration change: error [validation] event [validate] operation [destroy] xpath [/frr-vrf:lib/vrf[name='vrf-2001606']] message: Only inactive VRFs can be deleted
2024/08/27 14:04:36 STATIC: [KFEJ3-7JXVF] BE-CLIENT: mgmt_be_txn_cfg_prepare: ERROR: Failed to validate configs txn-id: 1 1 batches, err: 'Only inactive VRFs can be deleted'
2024/08/27 14:04:36 MGMTD: [G7XEF-QM9RV] mgmt_txn_notify_be_cfgdata_reply: ERROR: CFGDATA_CREATE_REQ sent to 'staticd' failed txn-id: 1 batch-id 1 err: Only inactive VRFs can be deleted
2024/08/27 14:04:36 MGMTD: [GGJTQ-VTT01] SET_CONFIG request for client 0xd failed, Error: 'Only inactive VRFs can be deleted'
kernel
5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
docker
Docker version 25.0.3, build 4debf41
frr_runnig.txt frr_startup.txt
Checklist
- [X] I have searched the open issues for this bug.
- [X] I have not included sensitive information in this report.
How are you starting Zebra? Can you give us the container script that starts FR/R?
How are you starting Zebra? Can you give us the container script that starts FR/R?
#!/bin/bash
if [ -r "/lib/lsb/init-functions" ]; then
. /lib/lsb/init-functions
else
log_success_msg() {
echo "$@"
}
log_warning_msg() {
echo "$@" >&2
}
log_failure_msg() {
echo "$@" >&2
}
fi
source /usr/lib/frr/frrcommon.sh
/usr/lib/frr/watchfrr $(daemon_list)
ps aux | grep frr
1 root 0:01 /sbin/tini -- /usr/lib/frr/docker-start
7 root 0:00 {docker-start} /bin/bash /usr/lib/frr/docker-start
11 root 0:20 /usr/lib/frr/watchfrr zebra mgmtd bgpd staticd bfdd
159 frr 0:05 /usr/lib/frr/zebra -d -F traditional -A 127.0.0.1 -s 90000000 -M dplane_fpm_nl
165 frr 0:01 /usr/lib/frr/mgmtd -d -F traditional
167 frr 0:22 /usr/lib/frr/bgpd -d -F traditional -A 127.0.0.1
174 frr 0:01 /usr/lib/frr/staticd -d -F traditional -A 127.0.0.1
177 frr 15:29 /usr/lib/frr/bfdd -d -F traditional -A 127.0.0.1
1.configure static route vtysh conf t ip route 100.70.1.254/32 Null0 2.check route in kernel ip r | grep 100.70.1.254 blackhole 100.70.1.254 proto 196 metric 20 3.stop frr sudo docker stop frr 4.check route in kernel ip r | grep 100.70.1.254 .blackhole 100.70.1.254 proto 196 metric 20 5.start frr sudo docker start frr 6.check route in frr vtysh 7f4ad6eb72fb# show ip route 100.70.1.254/32 Routing entry for 100.70.1.254/32 Known via "static", distance 1, metric 0, best Last update 00:00:33 ago
- unreachable (blackhole), weight 1 7.try to delete static route from frr 7f4ad6eb72fb(config)# no ip route 100.70.1.254/32 Null0 7f4ad6eb72fb(config)# 7f4ad6eb72fb(config)# exit 7f4ad6eb72fb# 7f4ad6eb72fb# show ip route 100.70.1.254/32 % Network not in table 7f4ad6eb72fb# exit frr@7f4ad6eb72fb:/$ ip r | grep 100.70.1.254 frr@7f4ad6eb72fb: I followed above steps for reproduce .static route is succesfully deleted from kernel .
@riw777 @Darwin4053 Do you know guys, how to debug route updates in kernel when frr stopped?
1.configure static route vtysh conf t ip route 100.70.1.254/32 Null0 2.check route in kernel ip r | grep 100.70.1.254 blackhole 100.70.1.254 proto 196 metric 20 3.stop frr sudo systemctl stop frr 4.check route in kernl ip r | grep 100.70.1.254 i didn't see any route here. 5.start frr sudo systemctl start frr 6.check route in frr vtysh 7f4ad6eb72fb# show ip route 100.70.1.254/32 Routing entry for 100.70.1.254/32 Known via "static", distance 1, metric 0, best Last update 00:00:33 ago
- unreachable (blackhole), weight 1 7.try to delete static route from frr 7f4ad6eb72fb(config)# no ip route 100.70.1.254/32 Null0 7f4ad6eb72fb(config)# 7f4ad6eb72fb(config)# exit 7f4ad6eb72fb# 7f4ad6eb72fb# show ip route 100.70.1.254/32 % Network not in table 7f4ad6eb72fb# exit frr@7f4ad6eb72fb:/$ ip r | grep 100.70.1.254 frr@7f4ad6eb72fb: I followed above steps for reproduce .static route is succesfully deleted from kernel .
What version of frr did you test? I have this problem with 9.1.1, but not with 8.5.
@askorichenko hello! Can you help me, Is below fix applicable for routes in default vrf table?https://github.com/FRRouting/frr/pull/15570/commits/69f07fab28b32846a95571eb7404ef870cc3784c I see in pull request https://github.com/FRRouting/frr/pull/15424 that you reproduced bug in default vrf table, but in commit above I see some VRF related code. Also Is it could happen that your fix is not aware of static routes with Null0 (blackhole) nh configured through vtysh?
There is inconsistency, with docker when the processes receive signals. while passing SIGINT/SIGTERM to staticd sometimes route is getting cleared sometimes not.
@Darwin4053 staticd receives somehow SIGKILL instead SIGINT/SIGTERM even /sbin/tini used as ENTRYPOINT in docker image
ppoll([{fd=11, events=POLLIN}, {fd=12, events=POLLIN}, {fd=10, events=POLLIN}, {fd=13, events=POLLIN}, {fd=14, events=POLLIN}, {fd=6, events=POLLIN}], 6, NULL, [], 8 <unfinished ...>) = ?
+++ killed by SIGKILL +++
As I can see in tini logs, it only reaps watchfrr process correctly with SIGTERM, but all other processes in container end up with SIGKILL.
[DEBUG tini (1)] Passing signal: 'Terminated'
[TRACE tini (1)] No child to reap
[DEBUG tini (1)] Received SIGCHLD
[DEBUG tini (1)] Reaped child with pid: '7'
[INFO tini (1)] Main child exited with signal (with signal 'Terminated')
[TRACE tini (1)] No child to reap
[TRACE tini (1)] Exiting: child has exited
frr processes in docker for example
ps a
PID USER TIME COMMAND
1 root 0:00 /sbin/tini -vvv -- /usr/lib/frr/docker-start
7 root 0:00 {docker-start} /bin/bash /usr/lib/frr/docker-start
11 root 0:00 /usr/lib/frr/watchfrr zebra mgmtd bgpd staticd bfdd
27 frr 0:01 /usr/lib/frr/zebra -d -F traditional -A 127.0.0.1 -s 90000000 -M dplane_fpm_nl
33 frr 0:00 /usr/lib/frr/mgmtd -d -F traditional
35 frr 0:00 /usr/lib/frr/bgpd -d -F traditional -A 127.0.0.1
42 frr 0:00 /usr/lib/frr/staticd -d -F traditional -A 127.0.0.1
45 frr 0:02 /usr/lib/frr/bfdd -d -F traditional -A 127.0.0.1
even all frr daemons have parent pid of tini (1)
cat /proc/27/status | grep PPid
PPid: 1
cat /proc/33/status | grep PPid
PPid: 1
cat /proc/35/status | grep PPid
PPid: 1
cat /proc/42/status | grep PPid
PPid: 1
cat /proc/45/status | grep PPid
PPid: 1
another look to tini childs
pgrep -lP 1
7 /bin/bash
27 /usr/lib/frr/zebra
33 /usr/lib/frr/mgmtd
35 /usr/lib/frr/bgpd
42 /usr/lib/frr/staticd
45 /usr/lib/frr/bfdd
@Darwin4053 @riw777 Hello! I confirmed with tini contributors, that it should work with -g option, to send signal to all childs in its process group. But as I see in my container, all daemons has their own pgid.
ps -o pid,ppid,pgid,comm
PID PPID PGID COMMAND
1 0 1 tini
7 1 7 docker-start
11 7 7 watchfrr
27 1 27 zebra
33 1 33 mgmtd
35 1 35 bgpd
42 1 42 staticd
45 1 45 bfdd
117 0 117 bash
135 117 135 ps
Also I find in watchfrr code, that it to set different pgid for every daemon. https://github.com/FRRouting/frr/blob/master/watchfrr/watchfrr.c#L321 How can I overcome this watchfrr behaviour?
Hello! I have some updates. I eliminated tini as entrypoint, cause it doesn't help to stop frr daemons clearly. Also I added some code to docker-start file, so it can trap TERM signal, forward it to watchfrr and flush static routes from kernel.
1.configure static route vtysh conf t ip route 100.70.1.254/32 Null0 2.check route in kernel ip r | grep 100.70.1.254 blackhole 100.70.1.254 proto 196 metric 20 3.stop frr sudo systemctl stop frr 4.check route in kernl ip r | grep 100.70.1.254 i didn't see any route here. 5.start frr sudo systemctl start frr 6.check route in frr vtysh 7f4ad6eb72fb# show ip route 100.70.1.254/32 Routing entry for 100.70.1.254/32 Known via "static", distance 1, metric 0, best Last update 00:00:33 ago
- unreachable (blackhole), weight 1 7.try to delete static route from frr 7f4ad6eb72fb(config)# no ip route 100.70.1.254/32 Null0 7f4ad6eb72fb(config)# 7f4ad6eb72fb(config)# exit 7f4ad6eb72fb# 7f4ad6eb72fb# show ip route 100.70.1.254/32 % Network not in table 7f4ad6eb72fb# exit frr@7f4ad6eb72fb:/$ ip r | grep 100.70.1.254 frr@7f4ad6eb72fb: I followed above steps for reproduce .static route is succesfully deleted from kernel .
What version of frr did you test? I have this problem with 9.1.1, but not with 8.5.
IN 9.1.1 only I have tested.