frr icon indicating copy to clipboard operation
frr copied to clipboard

zebra: fix removing kernel and connected routes on interface linkdown

Open yar-fed opened this issue 2 years ago • 5 comments

Hello, this is an attempt to fix the issue https://github.com/FRRouting/frr/issues/7299.

TLDR of the problem is that currently FRR treats a link down and administrative down for an interface as the same events, if you don't disable link detection completely, but kernel (at least Linux) deletes routes on administrative down and does not delete them on link down. This causes "kernel" routes to disappear (connected routes are recreated in RIB on if_up).

I found that the code related to my simple test scenario could be guarded with a few conditions, but it seems too easy and probably ignores some edge cases.

Notably the one thing that my fix ignores is the procfs option mentioned here https://github.com/FRRouting/frr/issues/7299#issuecomment-1095858414.

I have considered additionally setting some flag (new or existing) to indicate that the route is linkdown/blackhole, but I don't know which option to choose as I am not very familiar with the code base. I also thought about splitting if_down into if_linkdown and if_down, if there are more things that should only be done on an administrative down.

This is the test scenario that I used (Linux 5.10.70, x86_64, Openwrt 21+, FRR 8.1.0)

Starting configuration

root@root:~# ip r
default via 192.168.56.1 dev eth3 proto static src 192.168.56.231 metric 4261412865 
10.10.10.0/24 dev eth0 proto kernel scope link src 10.10.10.1 
10.10.20.0/24 via 10.10.10.2 dev eth0 metric 16777216 
192.168.56.0/24 dev eth3 proto kernel scope link src 192.168.56.231 
root@root:~# vtysh -c "sh ip route"
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

K>* 0.0.0.0/0 [254/1] via 192.168.56.1, eth3, src 192.168.56.231, 00:46:56
C>* 10.10.10.0/24 is directly connected, eth0, 00:46:56
K>* 10.10.20.0/24 [1/0] via 10.10.10.2, eth0, 00:46:56
C>* 192.168.56.0/24 is directly connected, eth3, 00:46:56
root@root:~# ip -br l show dev eth0
eth0             UP             0c:13:fa:95:00:00 <BROADCAST,MULTICAST,UP,LOWER_UP> 
root@root:~# ip -br a show dev eth0
eth0             UP             10.10.10.1/24 fd88:f72d:f91e::1/60 fe80::e13:faff:fe95:0/64 

shutdown link

root@root:~# ip -br l show dev eth0
eth0             DOWN           0c:13:fa:95:00:00 <NO-CARRIER,BROADCAST,MULTICAST,UP> 
root@root:~# ip r
default via 192.168.56.1 dev eth3 proto static src 192.168.56.231 metric 4261412865 
10.10.10.0/24 dev eth0 proto kernel scope link src 10.10.10.1 linkdown 
10.10.20.0/24 via 10.10.10.2 dev eth0 metric 16777216 linkdown 
192.168.56.0/24 dev eth3 proto kernel scope link src 192.168.56.231 
root@root:~# vtysh -c "sh ip route"
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

K>* 0.0.0.0/0 [254/1] via 192.168.56.1, eth3, src 192.168.56.231, 00:48:31
C>* 10.10.10.0/24 is directly connected, eth0, 00:48:31
K>* 10.10.20.0/24 [1/0] via 10.10.10.2, eth0, 00:48:31
C>* 192.168.56.0/24 is directly connected, eth3, 00:48:31

bring link back up

root@root:~# ip -br l show dev eth0
eth0             UP             0c:13:fa:95:00:00 <BROADCAST,MULTICAST,UP,LOWER_UP> 
root@root:~# ip r
default via 192.168.56.1 dev eth3 proto static src 192.168.56.231 metric 4261412865 
10.10.10.0/24 dev eth0 proto kernel scope link src 10.10.10.1 
10.10.20.0/24 via 10.10.10.2 dev eth0 metric 16777216 
192.168.56.0/24 dev eth3 proto kernel scope link src 192.168.56.231 
root@root:~# vtysh -c "sh ip route"
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

K>* 0.0.0.0/0 [254/1] via 192.168.56.1, eth3, src 192.168.56.231, 00:52:15
C>* 10.10.10.0/24 is directly connected, eth0, 00:52:15
K>* 10.10.20.0/24 [1/0] via 10.10.10.2, eth0, 00:52:15
C>* 192.168.56.0/24 is directly connected, eth3, 00:52:15

administratively shutdown interface

root@root:~# ip -br l set dev eth0 down
root@root:~# ip r
default via 192.168.56.1 dev eth3 proto static src 192.168.56.231 metric 4261412865 
192.168.56.0/24 dev eth3 proto kernel scope link src 192.168.56.231 
root@root:~# vtysh -c "sh ip route"
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

K>* 0.0.0.0/0 [254/1] via 192.168.56.1, eth3, src 192.168.56.231, 00:53:34
C>* 192.168.56.0/24 is directly connected, eth3, 00:53:34

administratively enable interface

root@root:~# ip -br l set dev eth0 up
root@root:~# ip r
default via 192.168.56.1 dev eth3 proto static src 192.168.56.231 metric 4261412865 
10.10.10.0/24 dev eth0 proto kernel scope link src 10.10.10.1 
192.168.56.0/24 dev eth3 proto kernel scope link src 192.168.56.231 
root@root:~# vtysh -c "sh ip route"
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

K>* 0.0.0.0/0 [254/1] via 192.168.56.1, eth3, src 192.168.56.231, 00:53:49
C>* 10.10.10.0/24 is directly connected, eth0, 00:00:10
C>* 192.168.56.0/24 is directly connected, eth3, 00:53:49

yar-fed avatar Sep 19 '22 22:09 yar-fed

Continuous Integration Result: FAILED

See below for issues. CI System Testrun URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-7480/

This is a comment from an automated CI system. For questions and feedback in regards to this CI system, please feel free to email Martin Winter - mwinter (at) opensourcerouting.org.

Get source / Pull Request: Successful

Building Stage: Successful

Basic Tests: Failed

Topotests Ubuntu 18.04 i386 part 5: Failed (click for details)

Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO5U18I386-7480/test

Topology Tests failed for Topotests Ubuntu 18.04 i386 part 5 see full log at https://ci1.netdef.org/browse/FRR-PULLREQ2-7480/artifact/TOPO5U18I386/ErrorLog/log_topotests.txt

Topotests Ubuntu 18.04 i386 part 9: Failed (click for details)

Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO9U18I386-7480/test

Topology Tests failed for Topotests Ubuntu 18.04 i386 part 9 see full log at https://ci1.netdef.org/browse/FRR-PULLREQ2-7480/artifact/TOPO9U18I386/ErrorLog/log_topotests.txt

Topotests Ubuntu 18.04 arm8 part 8: Failed (click for details) Topotests Ubuntu 18.04 arm8 part 8: No useful log found
Topotests Ubuntu 18.04 i386 part 8: Failed (click for details)

Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO8U18I386-7480/test

Topology Tests failed for Topotests Ubuntu 18.04 i386 part 8 see full log at https://ci1.netdef.org/browse/FRR-PULLREQ2-7480/artifact/TOPO8U18I386/ErrorLog/log_topotests.txt

Topotests debian 10 amd64 part 8: Failed (click for details)

Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO8DEB10AMD64-7480/test

Topology Tests failed for Topotests debian 10 amd64 part 8 see full log at https://ci1.netdef.org/browse/FRR-PULLREQ2-7480/artifact/TOPO8DEB10AMD64/ErrorLog/log_topotests.txt

Topotests Ubuntu 18.04 arm8 part 9: Failed (click for details) Topotests Ubuntu 18.04 arm8 part 9: No useful log found
Topotests debian 10 amd64 part 9: Failed (click for details)

Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO9DEB10AMD64-7480/test

Topology Tests failed for Topotests debian 10 amd64 part 9 see full log at https://ci1.netdef.org/browse/FRR-PULLREQ2-7480/artifact/TOPO9DEB10AMD64/ErrorLog/log_topotests.txt

Topotests Ubuntu 18.04 amd64 part 8: Failed (click for details)

Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO8U18ARM64-7480/test

Topology Tests failed for Topotests Ubuntu 18.04 amd64 part 8 see full log at https://ci1.netdef.org/browse/FRR-PULLREQ2-7480/artifact/TOPO8U18ARM64/ErrorLog/log_topotests.txt

Topotests Ubuntu 18.04 amd64 part 9: Failed (click for details)

Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO9U18AMD64-7480/test

Topology Tests failed for Topotests Ubuntu 18.04 amd64 part 9 see full log at https://ci1.netdef.org/browse/FRR-PULLREQ2-7480/artifact/TOPO9U18AMD64/ErrorLog/log_topotests.txt

Topotests Ubuntu 18.04 arm8 part 5: Failed (click for details) Topotests Ubuntu 18.04 arm8 part 5: No useful log found
Topotests Ubuntu 18.04 amd64 part 5: Failed (click for details)

Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO5U18AMD64-7480/test

Topology Tests failed for Topotests Ubuntu 18.04 amd64 part 5 see full log at https://ci1.netdef.org/browse/FRR-PULLREQ2-7480/artifact/TOPO5U18AMD64/ErrorLog/log_topotests.txt

Successful on other platforms/tests
  • Topotests debian 10 amd64 part 6
  • Topotests debian 10 amd64 part 1
  • Topotests Ubuntu 18.04 arm8 part 1
  • Topotests Ubuntu 18.04 amd64 part 7
  • Fedora 29 rpm pkg check
  • Addresssanitizer topotests part 3
  • Topotests Ubuntu 18.04 i386 part 0
  • Topotests Ubuntu 18.04 amd64 part 4
  • CentOS 7 rpm pkg check
  • Addresssanitizer topotests part 2
  • Topotests Ubuntu 18.04 amd64 part 0
  • Topotests debian 10 amd64 part 7
  • Debian 9 deb pkg check
  • Addresssanitizer topotests part 8
  • Topotests debian 10 amd64 part 5
  • Topotests Ubuntu 18.04 i386 part 7
  • Topotests Ubuntu 18.04 amd64 part 6
  • Topotests Ubuntu 18.04 i386 part 2
  • Topotests Ubuntu 18.04 arm8 part 6
  • Addresssanitizer topotests part 6
  • Topotests Ubuntu 18.04 amd64 part 1
  • Ubuntu 18.04 deb pkg check
  • Topotests Ubuntu 18.04 amd64 part 2
  • Addresssanitizer topotests part 5
  • Addresssanitizer topotests part 4
  • Topotests Ubuntu 18.04 i386 part 3
  • Addresssanitizer topotests part 0
  • Topotests Ubuntu 18.04 arm8 part 4
  • IPv6 protocols on Ubuntu 18.04
  • Topotests debian 10 amd64 part 4
  • Topotests debian 10 amd64 part 3
  • Topotests Ubuntu 18.04 arm8 part 3
  • Topotests Ubuntu 18.04 amd64 part 3
  • Addresssanitizer topotests part 1
  • Topotests Ubuntu 18.04 i386 part 4
  • Topotests Ubuntu 18.04 arm8 part 7
  • Addresssanitizer topotests part 9
  • IPv4 protocols on Ubuntu 18.04
  • Static analyzer (clang)
  • Topotests Ubuntu 18.04 arm8 part 0
  • Topotests debian 10 amd64 part 0
  • IPv4 ldp protocol on Ubuntu 18.04
  • Ubuntu 16.04 deb pkg check
  • Topotests Ubuntu 18.04 arm8 part 2
  • Topotests debian 10 amd64 part 2
  • Debian 10 deb pkg check
  • Topotests Ubuntu 18.04 i386 part 1
  • Addresssanitizer topotests part 7
  • Topotests Ubuntu 18.04 i386 part 6
  • Ubuntu 20.04 deb pkg check

Warnings Generated during build:

Checkout code: Successful with additional warnings
Topotests Ubuntu 18.04 i386 part 5: Failed (click for details)

Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO5U18I386-7480/test

Topology Tests failed for Topotests Ubuntu 18.04 i386 part 5 see full log at https://ci1.netdef.org/browse/FRR-PULLREQ2-7480/artifact/TOPO5U18I386/ErrorLog/log_topotests.txt

Topotests Ubuntu 18.04 i386 part 9: Failed (click for details)

Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO9U18I386-7480/test

Topology Tests failed for Topotests Ubuntu 18.04 i386 part 9 see full log at https://ci1.netdef.org/browse/FRR-PULLREQ2-7480/artifact/TOPO9U18I386/ErrorLog/log_topotests.txt

Topotests Ubuntu 18.04 arm8 part 8: Failed (click for details) Topotests Ubuntu 18.04 arm8 part 8: No useful log found
Topotests Ubuntu 18.04 i386 part 8: Failed (click for details)

Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO8U18I386-7480/test

Topology Tests failed for Topotests Ubuntu 18.04 i386 part 8 see full log at https://ci1.netdef.org/browse/FRR-PULLREQ2-7480/artifact/TOPO8U18I386/ErrorLog/log_topotests.txt

Topotests debian 10 amd64 part 8: Failed (click for details)

Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO8DEB10AMD64-7480/test

Topology Tests failed for Topotests debian 10 amd64 part 8 see full log at https://ci1.netdef.org/browse/FRR-PULLREQ2-7480/artifact/TOPO8DEB10AMD64/ErrorLog/log_topotests.txt

Topotests Ubuntu 18.04 arm8 part 9: Failed (click for details) Topotests Ubuntu 18.04 arm8 part 9: No useful log found
Topotests debian 10 amd64 part 9: Failed (click for details)

Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO9DEB10AMD64-7480/test

Topology Tests failed for Topotests debian 10 amd64 part 9 see full log at https://ci1.netdef.org/browse/FRR-PULLREQ2-7480/artifact/TOPO9DEB10AMD64/ErrorLog/log_topotests.txt

Topotests Ubuntu 18.04 amd64 part 8: Failed (click for details)

Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO8U18ARM64-7480/test

Topology Tests failed for Topotests Ubuntu 18.04 amd64 part 8 see full log at https://ci1.netdef.org/browse/FRR-PULLREQ2-7480/artifact/TOPO8U18ARM64/ErrorLog/log_topotests.txt

Topotests Ubuntu 18.04 amd64 part 9: Failed (click for details)

Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO9U18AMD64-7480/test

Topology Tests failed for Topotests Ubuntu 18.04 amd64 part 9 see full log at https://ci1.netdef.org/browse/FRR-PULLREQ2-7480/artifact/TOPO9U18AMD64/ErrorLog/log_topotests.txt

Topotests Ubuntu 18.04 arm8 part 5: Failed (click for details) Topotests Ubuntu 18.04 arm8 part 5: No useful log found
Topotests Ubuntu 18.04 amd64 part 5: Failed (click for details)

Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO5U18AMD64-7480/test

Topology Tests failed for Topotests Ubuntu 18.04 amd64 part 5 see full log at https://ci1.netdef.org/browse/FRR-PULLREQ2-7480/artifact/TOPO5U18AMD64/ErrorLog/log_topotests.txt

Report for interface.c | 8 issues
===============================================
< WARNING: Block comments use * on subsequent lines
< #1073: FILE: /tmp/f1-8173/interface.c:1073:
< WARNING: Block comments use a trailing */ on a separate line
< #1073: FILE: /tmp/f1-8173/interface.c:1073:
< WARNING: Block comments use * on subsequent lines
< #1106: FILE: /tmp/f1-8173/interface.c:1106:
< WARNING: Block comments use a trailing */ on a separate line
< #1106: FILE: /tmp/f1-8173/interface.c:1106:

NetDEF-CI avatar Sep 20 '22 00:09 NetDEF-CI

@yar-fed -> I believe I fixed this issue already. Can you please try the problem on latest master without your code to see if it still exists?

donaldsharp avatar Sep 20 '22 11:09 donaldsharp

If the issue still exists can you give me a sequence of commands that shows the problem so I may understand what I am mising better?

donaldsharp avatar Sep 20 '22 11:09 donaldsharp

@donaldsharp Thanks, I tested on latest master and the removal of kernel routes is fixed. But

  1. connected routes are still being removed, while also remaining in the kernel table with linkdown flag.
  2. there is a new separate bug: those linkdown kernel routes are not removed when ip address is deleted on interface with "ip address del"

Also can you link a PR (or multiple PRs) that addressed the original issue (or maybe there is already a backport to 8.1 or 8.3)?

yar-fed avatar Sep 20 '22 23:09 yar-fed

I believe the connected routes come back on link up, correct? and for #2 can you show me a series of commands that show the issue?

donaldsharp avatar Sep 21 '22 17:09 donaldsharp

@donaldsharp Can you please show the commit that fix this behaviour? I use 8.4.2 frr and there are a some reasons that do not use the latest master. If I understand right is this PR does not quite correct?

IamMarshmello avatar Aug 23 '23 14:08 IamMarshmello

In my case I receive bgp route from my neigbor Then I delete ip address from interface and configure it again with the same address. Interface is in up state. After that I do not see routes from my neighbor in "ip r", but I see them if I run in vtysh "show ip route"

IamMarshmello avatar Aug 23 '23 14:08 IamMarshmello