frr
frr copied to clipboard
Wierd BGP IPv6 ll nh behavior
Hi All!. FRR version 10.0.
I have two interfaces with ipv6 ll addresses and EBGP IPv6 sessions
7: ens13f0np0.80@ens13f0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 18:9b:a5:82:25:e2 brd ff:ff:ff:ff:ff:ff
inet6 fe80:14:fc01:1::2/64 scope link
valid_lft forever preferred_lft forever
inet6 fe80::1a9b:a5ff:fe82:25e2/64 scope link
valid_lft forever preferred_lft forever
10: ens28f0np0.80@ens28f0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether e8:eb:d3:b3:54:b6 brd ff:ff:ff:ff:ff:ff
inet6 fe80:14:fc01:2::2/64 scope link
valid_lft forever preferred_lft forever
inet6 fe80::eaeb:d3ff:feb3:54b6/64 scope link
valid_lft forever preferred_lft forever
FRR settings
_frr version 10.0
frr defaults traditional
hostname el-fw1.cdnwb.ru
log syslog informational
service integrated-vtysh-config
router bgp 65323
neighbor SW-LAN peer-group
neighbor fe80:14:fc01:1::1 peer-group SW-LAN
neighbor fe80:14:fc01:1::1 interface ens13f0np0.80
no neighbor fe80:14:fc01:1::1 enforce-first-as
neighbor fe80:14:fc01:2::1 peer-group SW-LAN
neighbor fe80:14:fc01:2::1 interface ens28f0np0.80
no neighbor fe80:14:fc01:2::1 enforce-first-as
address-family ipv6 unicast
neighbor SW-LAN activate
neighbor SW-LAN soft-reconfiguration inbound
neighbor SW-LAN route-map FROM_LAN_V6 in
neighbor SW-LAN route-map TO_LAN_V6 out
exit-address-family_
All sessions are UP and stable
_Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
fe80:14:fc01:1::1 4 65322 2400 2170 11 0 0 16:27:31 1 0 N/A
fe80:14:fc01:2::1 4 65322 2362 2142 11 0 0 16:27:31 1 0 N/A_
Both BGP peer announce me one IPv6 prefix, 2a03:720:1000::/36
el-fw1.cdnwb.ru# sh bgp neighbors fe80:14:fc01:1::1 received-routes
_BGP table version is 11, local router ID is 192.168.0.1, vrf id 0
Default local pref 100, local AS 65323
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
*> 2a03:720:1000::/36
fe80:14:fc01:1::1
0 65322 4206000170 57073 i
Total number of prefixes 1_
el-fw1.cdnwb.ru# sh bgp neighbors fe80:14:fc01:2::1 received-routes
BGP table version is 11, local router ID is 192.168.0.1, vrf id 0
Default local pref 100, local AS 65323
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
*> 2a03:720:1000::/36
fe80:14:fc01:2::1
0 65322 4206000170 57073 i
Total number of prefixes 1
So, BGP signaling is ok, but i have very weird situation for adding routes to RIB. So
_el-fw1.cdnwb.ru# sh bgp neighbors fe80:14:fc01:1::1 received-routes detail
BGP table version is 11, local router ID is 192.168.0.1, vrf id 0
Default local pref 100, local AS 65323
BGP routing table entry for 2a03:720:1000::/36, version 11
Paths: (2 available, best #1, table default)
Not advertised to any peer
65322 4206000170 57073
**fe80:14:fc01:2::1** from **fe80:14:fc01:2::1** (10.255.193.111)
(fe80:14:fc01:2::1) (used)
Origin IGP, valid, external, best (First path received)
Last update: Mon May 27 17:45:41 2024
65322 4206000170 57073
**fe80:14:fc01:1::1** (inaccessible, import-check enabled) from **fe80:14:fc01:1::1** (10.255.193.110)
(fe80:14:fc01:1::1) (used)
Origin IGP, invalid, external
Last update: Mon May 27 17:45:41 2024
Total number of prefixes 1_
Question number 1 why route from peer fe80:14:fc01:2::1 is shown as route from peer fe80:14:fc01:1::1 And the second question is probably related to the first, i have a big problem with installing route to the RIB. Some time i have both routes
_B>* 2a03:720:1000::/36 [20/0] via **fe80:14:fc01:1::1,** ens13f0np0.80, weight 1, 00:11:59
** via **fe80:14:fc01:2::1**, ens28f0np0.80, weight 1, 00:11:59_
Sometimes one
_B>* 2a03:720:1000::/36 [20/0] via fe80:14:fc01:2::1, ens28f0np0.80, weight 1, 16:38:59_
Some times none :-(
Help me please.
Can you enable debug bgp updates, debug bgp neighbor, debug bgp nht and then send us the logs?
Also, just in case the following commands outputs would be handy too:
show ipv6 nht
show bgp nexthop
show bgp import-check-table
Done
VRF default:
Resolve via default: on
fe80:14:fc01:1::1(Connected)
resolved via connected
is directly connected, ens13f0np0.80 (vrf default)
Client list: bgp(fd 18)
fe80:14:fc01:2::1(Connected)
resolved via connected
is directly connected, ens28f0np0.80 (vrf default)
Client list: bgp(fd 18)
el-fw1.cdnwb.ru# show bgp nexthop
Current BGP nexthop cache:
fe80:14:fc01:1::1 valid [IGP metric 0], #paths 0, peer fe80:14:fc01:1::1
if ens13f0np0.80
Last update: Mon May 27 16:26:58 2024
fe80:14:fc01:2::1 valid [IGP metric 0], #paths 1, peer fe80:14:fc01:2::1
if ens28f0np0.80
Last update: Mon May 27 16:35:25 2024
fe80:14:fc01:1::1 invalid, #paths 1
Must be Connected
Last update: Wed May 22 17:20:29 2024
el-fw1.cdnwb.ru# show bgp import-check-table
Current BGP import check cache:
el-fw1.cdnwb.ru#_
You have something strange in next-hop cache:
fe80:14:fc01:1::1 valid [IGP metric 0], #paths 0, peer fe80:14:fc01:1::1
if ens13f0np0.80
Last update: Mon May 27 16:26:58 2024
fe80:14:fc01:1::1 invalid, #paths 1
Must be Connected
Last update: Wed May 22 17:20:29 2024
Two entries for the same next-hop, but one is invalid. And the last update is way older. Does this happens (bad behavior) even when the router is restarted? Or is that starting to happen after some time?
I dont know its related or not. I have similar issue like this after restore config from 9.1 to 10.0 (which is enforce-first-as as default). Triggering command with no neighbor XXX enforce-first-as bring still showing weird low number of received-routes. Clear ip bgp also not works until solved by neighbor XXX shutdown and no shutdown.
So command no neighbor XXX enforce-first-as need shut and no shut the peer then the command will aplied.
You have something strange in next-hop cache:
fe80:14:fc01:1::1 valid [IGP metric 0], #paths 0, peer fe80:14:fc01:1::1 if ens13f0np0.80 Last update: Mon May 27 16:26:58 2024 fe80:14:fc01:1::1 invalid, #paths 1 Must be Connected Last update: Wed May 22 17:20:29 2024Two entries for the same next-hop, but one is invalid. And the last update is way older. Does this happens (bad behavior) even when the router is restarted? Or is that starting to happen after some time?
It's a new router with new ipv6 design. A have got this problem just after the frr and host configurations were completed. There was one period when everything was working, about 15 minutes. It seems to me that after the restart FRR the situation may change. Both nexthops can become invalid, for example, or both can work, anything is possible. By the way, now nh table is el-fw1.cdnwb.ru# sh bgp nexthop Current BGP nexthop cache: fe80:14:fc01:1::1 valid [IGP metric 0], #paths 0, peer fe80:14:fc01:1::1 if ens13f0np0.80 Last update: Mon May 27 16:26:58 2024 fe80:14:fc01:2::1 valid [IGP metric 0], #paths 1, peer fe80:14:fc01:2::1 if ens28f0np0.80 Last update: Mon May 27 16:35:25 2024 fe80:14:fc01:1::1 invalid, #paths 1 Must be Connected Last update: Wed May 22 17:20:29 2024 el-fw1.cdnwb.ru#
I dont know its related or not. I have similar issue like this after restore config from 9.1 to 10.0 (which is enforce-first-as as default). Triggering command with no neighbor XXX enforce-first-as bring still showing weird low number of received-routes. Clear ip bgp also not works until solved by neighbor XXX shutdown and no shutdown.
So command no neighbor XXX enforce-first-as need shut and no shut the peer then the command will aplied.
Sorry, it didn't help me
Could you also show "show ipv6 route"?
I have a similar issue:
version
FRRouting 10.1.1 (frr-10.1.1) on Linux(6.6.52-0-virt).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
configured with:
'--prefix=/usr' '--localstatedir=/run/frr' '--sbindir=/usr/lib/frr' '--sysconfdir=/etc/frr' '--libdir=/usr/lib/frr' '--with-moduledir=/usr/lib/frr/modules' '--disable-dependency-tracking' '--enable-rpki' '--with-libpam' '--enable-doc' '--enable-doc-html' '--enable-snmp' '--enable-fpm' '--disable-protobuf' '--disable-zeromq' '--enable-ospfapi' '--enable-bgp-vnc' '--enable-multipath=256' '--enable-user=frr' '--enable-group=frr' '--enable-vty-group=frrvty' '--enable-configfile-mask=0640' '--enable-logfile-mask=0640' 'CC=gcc' 'CXX=g++' 'PYTHON=python3'
Router configs
r1:
hostname r1
ip router-id 1.1.1.1
!
interface eth0
ipv6 address fd00:1111::1/48
ipv6 address fe80::1111/64
exit
!
router bgp 1
neighbor fe80::2222 remote-as 2
neighbor fe80::2222 interface eth0
!
address-family ipv6 unicast
network fd00:1111::/48
neighbor fe80::2222 activate
neighbor fe80::2222 route-map map in
neighbor fe80::2222 route-map map out
exit-address-family
exit
!
route-map map permit 1
exit
r2:
hostname r2
ip router-id 2.2.2.2
!
interface eth0
ipv6 address fd00:2222::1/48
ipv6 address fe80::2222/64
exit
!
router bgp 2
neighbor fe80::1111 remote-as 1
neighbor fe80::1111 interface eth0
!
address-family ipv6 unicast
network fd00:2222::/48
neighbor fe80::1111 activate
neighbor fe80::1111 route-map map in
neighbor fe80::1111 route-map map out
exit-address-family
exit
!
route-map map permit 1
exit
More information
r1:
r1# show bgp
BGP table version is 1, local router ID is 1.1.1.1, vrf id 0
Default local pref 100, local AS 1
Status codes: s suppressed, d damped, h history, u unsorted, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
*> fd00:1111::/48 :: 0 32768 i
fd00:2222::/48 fe80::2222 0 0 2 i
Displayed 2 routes and 2 total paths
r1# show bgp fd00:2222::/48
BGP routing table entry for fd00:2222::/48, version 0
Paths: (1 available, no best path)
Not advertised to any peer
2
fd00:2222::1 (inaccessible, import-check enabled) from fe80::2222 (2.2.2.2)
(fe80::2222) (used)
Origin IGP, metric 0, invalid, external
Last update: Fri Sep 27 19:49:23 2024
r1# show bgp nexthop
Current BGP nexthop cache:
fe80::2222 valid [IGP metric 0], #paths 0, peer fe80::2222
Resolved prefix fe80::/64
if eth0
Last update: Fri Sep 27 19:46:23 2024
fe80::2222 invalid, #paths 1
Must be Connected
Last update: Fri Sep 27 19:45:41 2024
r1# ping fe80::2222%eth0
PING fe80::2222%eth0 (fe80::2222%2): 56 data bytes
64 bytes from fe80::2222: seq=0 ttl=64 time=0.780 ms
64 bytes from fe80::2222: seq=1 ttl=64 time=1.565 ms
64 bytes from fe80::2222: seq=2 ttl=64 time=0.988 ms
64 bytes from fe80::2222: seq=3 ttl=64 time=1.255 ms
r1# show ipv6 route
Codes: K - kernel route, C - connected, L - local, S - static,
R - RIPng, O - OSPFv3, I - IS-IS, B - BGP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
f - OpenFabric, t - Table-Direct,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
C>* fd00:1111::/48 is directly connected, eth0, 00:15:08
L>* fd00:1111::1/128 is directly connected, eth0, 00:15:08
C>* fe80::/64 is directly connected, eth0, 00:15:25
r2:
r2# show bgp
BGP table version is 2, local router ID is 2.2.2.2, vrf id 0
Default local pref 100, local AS 2
Status codes: s suppressed, d damped, h history, u unsorted, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
*> fd00:1111::/48 fe80::1111 0 0 1 i
*> fd00:2222::/48 :: 0 32768 i
Displayed 2 routes and 2 total paths
r2# show bgp fd00:1111::/48
BGP routing table entry for fd00:1111::/48, version 2
Paths: (1 available, best #1, table default)
Advertised to non peer-group peers:
fe80::1111
1
fd00:1111::1 from fe80::1111 (1.1.1.1)
(fe80::1111) (used)
Origin IGP, metric 0, valid, external, best (First path received)
Last update: Fri Sep 27 19:49:23 2024
r2# show bgp nexthop
Current BGP nexthop cache:
fe80::1111 valid [IGP metric 0], #paths 1, peer fe80::1111
Resolved prefix fe80::/64
if eth0
Last update: Fri Sep 27 19:47:21 2024
r2# ping fe80::1111%eth0
PING fe80::1111%eth0 (fe80::1111%2): 56 data bytes
64 bytes from fe80::1111: seq=0 ttl=64 time=0.516 ms
64 bytes from fe80::1111: seq=1 ttl=64 time=0.740 ms
64 bytes from fe80::1111: seq=2 ttl=64 time=1.444 ms
64 bytes from fe80::1111: seq=3 ttl=64 time=1.616 ms
r2# show ipv6 route
Codes: K - kernel route, C - connected, L - local, S - static,
R - RIPng, O - OSPFv3, I - IS-IS, B - BGP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
f - OpenFabric, t - Table-Direct,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
B>* fd00:1111::/48 [20/0] via fe80::1111, eth0, weight 1, 00:12:06
C>* fd00:2222::/48 is directly connected, eth0, 00:14:07
L>* fd00:2222::1/128 is directly connected, eth0, 00:14:07
C>* fe80::/64 is directly connected, eth0, 00:15:22
Bumping this as I am also seeing this issue. LL next hops are being shown as inaccessible on one of the peers despite being accessible. frr version 10.2
Any updates on this? (ping @ton31337, you asked for additional debug output)
Could you try with disable-connected-check for these neighbors? But I need debug logs still, could you give us here also? To compare with the other debug we have already.
Also, to make it clear, could you disable ipv4 address family (no neighbor xxx under address-family ipv4 unicast) for this neighbor and try? If the issue is gone, then the root cause is clear.
Attached the debug information including logging for:
BGP debugging status:
BGP neighbor-events debugging is on
BGP next-hop tracking debugging is on
BGP updates debugging is on (inbound)
BGP updates debugging is on (outbound)
Note: these is still an older frr versions, I can update them later.
Ok, but please try with what I wrote above first.
I have not collected any debug logs but I have IPv4 unicast disabled entirely and just tried disabling the connected check on my peer but at least for my case it didn't make any difference.
I have the running configuration at the end of the log. I have set configuration options outlined earlier (example from r1):
neighbor fe80::2222 disable-connected-check
address-family ipv4 unicast
no neighbor fe80::2222 activate
exit-address-family
Okey, thanks, I will bootstrap the things with your configurations.
@famfo was it okay with 9.1? Or any lower version.
Yes, I don't remember the exact version when it stopped working
#14818 looks related
I have just encountered the same issue as @famfo, after VyOS bumped their FRR version from 9.1 to 10.2.
So I'm running 10.2 on one side (VyOS), talking to 8.5.6 on the other side (OPNsense), using fe80:: link-local addresses and they are not connecting.
The 8.5.6 side get a connection reset every 6 minutes:
fe80::101 [Error] bgp_read_packet error: Connection reset by peer
FRR 9.1 works fine, so I downgraded to that for now.
I can confirm I am seeing similar behaviour on Vyos 1.5-rolling-202412100007 which is running FRR version 9.1.2:
~$ show version frr
FRRouting 9.1.2 (xxxx) on Linux(6.6.64-vyos).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
I outlined the issue with some debug logs in https://vyos.dev/T7061
For me, the issue is over Wireguard tunnel interfaces, although I have not tried with non-Wireguard interfaces. My IPv6 LL's are marked as inaccessible even though I can ping the next-hop IP address. The issue is fixed by either a reboot or, oddly enough, doing a tcpdump on the tunnel interface.
The issue is easily reproducible for me by resetting the BGP peer.
EDIT: I have tried rolling back to a Vyos 1.4 version which uses FRR version 9.1 and have not been able to reproduce the issue:
~$ show ver frr
FRRouting 9.1 (xxxx) on Linux(6.6.21-amd64-vyos).
I had also created a basic issue on VyOS https://vyos.dev/T7055
We at metal-stack.io face the same issue, and have seen some improvements by add no bgp enforce-first-as, but still not sure if this completely solved it. We are at frr-10.2.1 and a vanilla kernel 6.6.60 on ubuntu 24.04
I experience this issue as well.
It appears when a neighbor is created using neighbor fe80::1 ... it triggers the BGP nexthop table to be pre-populated with an invalid entry. After tcpdump or changing certain configuration, the nexthop table is re-evaluated and an additional entry is added, except this time it is valid.
While debugging, I found the issue goes away if you ensure any link-local with a real ifIndex is considered valid when nht entries are created.
I do not know what inserts the initial entry. However it might be better to fix there by not triggering to insert one?
I have a branch that contains the topotest bgp_ipv6_ll_peering2 along with the what I mentioned above if that helps, @ton31337?
Any updates? Did the fixes get merged into the latest version?
@jvoss, do you want to push a PR and see what our CI thinks about that?
@jvoss, do you want to push a PR and see what our CI thinks about that?
My workaround is potentially more of a hack than a fix. The change will basically assume any link-local address is valid in the BNC.
A better solution might be to not insert an invalid entry when a link-local neighbor is defined in the configuration... however I am not sure where this occurs.