frr
frr copied to clipboard
RIPng not converging when interface stays up
Describe the bug When running ripng with multiple paths to the same address a failure of one upstream router which does not result in a local interface going down causes refreshes of the route even though the announcements are coming from a different interface. For example, if I have interface A and interface B both which announce routes to 2001:db8::cafe:babe/128, if the router attached to interface A goes down WITHOUT interface A itself entering the down state the announcements from router B will continue to refresh the route indefinitely causing convergence to fail. This can happen when running rip over fou links or VPNs. This can also happen if ripngd dies on one router while the router itself remains up.
[x] Did you check if this is a duplicate issue? [ ] Did you test it on the latest FRRouting/frr master branch?
To Reproduce
- Setup 3 routers with ripng, have the 2 upstream routers announce routes for the same prefix over a virtual link like fou/IPv6 tunnels/VPN etc. An example topology looks like A ----> C <---- B. Routers A and B are NOT connected to each other.
- Bring down the up stream router that currently has the route by stopping ripngd, deleting the interface/stopping the VPN, etc. The link on the downstream router should stay up as it's not a physical link
- Check the rip routing table with
show ipv6 ripng
and watch as the route refreshes based on announcements from the other router coming from an unrelated interface
Expected behavior Given the announcements are no longer coming from the next hop interface for that route the route should not be refreshed and should be left to timeout at which point an announcement from another router will replace it.
Versions OS Version: OpenWRT 21.02.1 Kernel: 5.16.3 FRR: 7.5
I'm using the Docker Image frrouting/frr:v8.0.1
and running into the same issue.
Hello, I have a similar issue with same interface receiving anycast and lost one of the announcers.
I have a vpn hub and spoke with anycast routing, if any spoke fails, route keeps here...
FRR: 7.5.1
HUB
Spoke Spoke
Old Route in table
ip -6 r show | grep 3e91
aa00:aaaa:6162:6370:726f:3e91:0:1 via fe80::1111:c1ff:feaf:93cb dev gpn proto ripng metric 20 pref medium
Debug Packet and its including correct nexthop
tcpdump -ni gpn port 521 | grep :3e91
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on gvpn, link-type EN10MB (Ethernet), capture size 262144 bytes
11:27:08.730627 IP6 fe80::a808:abff:fea0:36ba.521 > fe80::4895:1ff:fe24:ff95.521: ripng-resp 2: aa00:aaaa:6162:6370:726f:3e91:0:1/128 (1) aa00:aaaa:6162:6370:726f:3e92:0:1/128 (1)
11:27:11.603522 IP6 fe80::a808:abff:fea0:36ba.521 > ff02::9.521: ripng-resp 2: aa00:aaaa:6162:6370:726f:3e91:0:1/128 (1)aa00:aaaa:6162:6370:726f:3e92:0:1/128 (1)
11:27:17.604994 IP6 fe80::a808:abff:fea0:36ba.521 > ff02::9.521: ripng-resp 2: aa00:aaaa:6162:6370:726f:3e91:0:1/128 (1) aa00:aaaa:6162:6370:726f:3e92:0:1/128 (1)
Restart frr
/etc/init.d/frr restart
* Stopped watchfrr
* Stopped ripngd
* Stopped zebra
* Stopped staticd
* Started watchfrr
Everything works
ip -6 r show | grep 3e91
aa00:aaaa:6162:6370:726f:3e91:0:1 via fe80::a808:abff:fea0:36ba dev gpn proto ripng metric 20 pref medium
ping aa00:aaaa:6162:6370:726f:3e91:0:1
PING aa00:aaaa:6162:6370:726f:3e91:0:1(aa00:aaaa:6162:6370:726f:3e91:0:1) 56 data bytes
64 bytes from aa00:aaaa::6162:6370:726f:3e91:0:1: icmp_seq=1 ttl=64 time=10.7 ms
64 bytes from aa00:aaaa:6162:6370:726f:3e91:0:1: icmp_seq=2 ttl=64 time=11.4 ms
Event Logs Unexpected
I see those event logs but I dont know if it is related or can be a resolvable cause
`2022/07/27 10:46:06 RIPNG: ripng join on gpn EADDRINUSE (ignoring)`
I have realized of interface change or update need to fit Garbage timers to lost router and learn in a new path once route was lost. If it is not garbage timer expired, route keeps there...we need to lost the advertisement in that timer and cause outage in the convergence, otherwise it does not work. In case of interfaces Active-Active maybe we should think about Architecture with RIP behaviour.
"If during the garbage collection period a new RIP Response for the route is received, then as you might expect the deletion process is aborted: the Garbage-Collection timer is cleared, the route is marked as valid again, and a new Timeout timer starts"
RFC said different thing Until the garbage-collection timer expires, the route is included in all updates sent by this router. When the garbage-collection timer expires, the route is deleted from the routing table.
Should a new route to this network be established while the garbage- collection timer is running, the new route will replace the one that is about to be deleted. In this case the garbage-collection timer must be cleared.
This issue is stale because it has been open 180 days with no activity. Comment or remove the autoclose
label in order to avoid having this issue closed.
Guess I need to bump this then