frr
frr copied to clipboard
zebra: Fix for #3973, tunnel interfaces on FreeBSD always unnumbered
For interfaces with a peer address (tunnels, etc.), use the netmask of the peer destination, rather than local end, to establish if the interface should be treated as unnumbered. Affects frr on FreeBSD and anywhere else where tunnels have a peer / destination address - affects GRE, VTI and such.
This must be tested against ospf unnumbered peers as well as pim unnumbered as well. As that this changes behavior here.
💚 Basic BGPD CI results: SUCCESS, 0 tests failed
Results table
| _ | _ |
|---|---|
| Result | SUCCESS git merge/8132 2c56e890 |
| Date | 02/22/2021 |
| Start | 18:45:40 |
| Finish | 19:25:05 |
| Run-Time | 39:25 |
| Total | 1815 |
| Pass | 1815 |
| Fail | 0 |
| Valgrind-Errors | 0 |
| Valgrind-Loss | 0 |
| Details | vncregress-2021-02-22-18:45:40.txt |
| Log | autoscript-2021-02-22-18:46:44.log.bz2 |
| Memory | 489 508 428 |
For details, please contact louberger
@donaldsharp this must be tested against anything it touches, however the issue this fixes does not occur on Linux, and on Linux the behaviour is unchanged for tunnels or other point to point interfaces - and regular interfaces do not have a peer address, in which case it works the way it used to.
To regression test this, I assume we'd have to make sure purposefully unnumbered interfaces remain unnumbered?
Also just to note that originally FRR never used to make an interface unnumbered on encountering an /32 in the first place - this is what #3973 is.
Edit: some commenters in #3973 say they are seeing this on Linux - I couldn't reproduce it on recent Debian.
Continuous Integration Result: SUCCESSFUL
Congratulations, this patch passed basic tests
Tested-by: NetDEF / OpenSourceRouting.org CI System
CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-17260/
This is a comment from an automated CI system. For questions and feedback in regards to this CI system, please feel free to email Martin Winter - mwinter (at) opensourcerouting.org.
In ospf I expect it to peer and work directly over /32 based network setups.
On A setup say 192.168.161.5/32 on a interface and on B that it is connected to setup say 192.168.161.12/32 on the connected interface and we should peer.
PIM should still work across this same set of interfaces as well.
Well I can confirm that this passes my tests. What is required for this to be accepted? Someone from the FRR team to find the time and test?
Edit: some commenters in #3973 say they are seeing this on Linux - I couldn't reproduce it on recent Debian.
I have the same behavior on CentOS 8.
c(config-if)# do show interface ppp0 Interface ppp0 is up, line protocol is up Link ups: 0 last: (never) Link downs: 0 last: (never) vrf: default index 61 metric 0 mtu 1450 speed 0 flags: <UP,POINTOPOINT,RUNNING,NOARP,MULTICAST> Type: PPP inet 172.15.102.2/32 peer 172.15.102.1/32 unnumbered Interface Type Other Interface Slave Type None c# show ip ospf interface ppp0 is up ifindex 61, MTU 1450 bytes, BW 0 Mbit <UP,POINTOPOINT,RUNNING,NOARP,MULTICAST> This interface is UNNUMBERED, Area 0.0.0.0 MTU mismatch detection: enabled Router ID 192.168.102.1, Network Type POINTOPOINT, Cost: 5 Transmit Delay is 1 sec, State Point-To-Point, Priority 1 No backup designated router on this network Multicast group memberships: OSPFAllRouters Timer intervals configured, Hello 10s, Dead 40s, Wait 40s, Retransmit 5 Hello due in 5.680s Neighbor Count is 1, Adjacent neighbor count is 1
I'm looking forward for the fix.
Edit: some commenters in #3973 say they are seeing this on Linux - I couldn't reproduce it on recent Debian.
I have the same behavior on CentOS 8.
In your case you will always see this regardless of the OS, and it's because both your side and the peer are /32s.
The current behaviour of FRR is that an interface is set to unnumbered if its IPv4 address is a /32 (which is a vast oversimplification). My patch only changes it so that it also checks if the interface has a peer/destination address and inspects that, but also looking for a /32. Perhaps instead it should never make it unnumbered if it's a tunnel or dialer interface - but that would introduce issues to other users.
It's a bad design decision in my opinion to arbitrarily mark interfaces as unnumbered based on their properties. An interface should be unnumbered if you designate it as unnumbered, and also a loopback is not the same as unnumbered. I don't see other network OSes doing this (namely IOS that zebra/quagga/frr are modeled on).
To your issue - your subnet here definitely looks like a /30, so if you have control over both sides of the link, just make this /30 and the problem will go away.
If, however, your interface is automatically configured by something like IPCP, then that highlights a bigger problem taking us back to the fact that FRR itself makes the decision that an interface is unnumbered. Maybe this isn't even a correct designation and FRR needs a specific case for point to point interfaces...
this isn't even a correct designation and FRR needs a specific case for point to point interfaces
Because peer to peer interfaces could have peer ip from one subnet and local ip from completely different. Like peer is ip 192.168.0.1 and local ip is 172.16.254.250 or something else. In CentOS 6 or 7 in Quagga it works correctly.
We briefly ran this patch in production with OPNsense 21.1.6 and got a hard no from a user reporting IPsec/OSPF dead in the water. I get that someone is convinced that this patch is good but in practice it does not seem to be so.
https://forum.opnsense.org/index.php?topic=23281.0
The only thing this patch does is that it modifies the interface type in FRR, which in turn changes the way OSPF works (with unnumbered we have ifindex, without we have IP), and merging it should come with a clear caveat in the release notes. It's obvious that this breaks things if the link type has changed on one end but not the other.
The OPNsense thread does not provide much detail apart from that it broke, and needed statics. This tells me that the reverse might have happened - likely the tunnel peer was a /32, and OSPF neighbours became unnumbered where they previously weren't. Was this OSPF over IPSec + GRE or VTI? I think reversing the patch like that immediately is oversimplifying the problem.
Again, unnumbered should be controlled by the user, not by FRR - the way it is now, it will break things for users either way, with or without the patch...
Hi guys. I have once again the same problem in OSPF linking PFSense and Mikrotik. A bug with the correct definition of the gre/gif interface mask on FreeBSD results in a classic subnet mask mismatch. Even in the case when the p2p interface mask is 32 still needs to be correctly processed for sending in the Hello packet.
PFSense p2p gif0:
!
interface gif0
ip ospf network point-to-point
!
sudo vtysh -c "show int gif0" | grep peer
inet 192.168.33.1/32 peer 192.168.33.2/32 unnumbered

Mikrotik p2p ipip:
ospf interface-template type=ptp
ip address=192.168.33.2 interface=ipip-tunnel1 network=192.168.33.2

Mikrotik correctly changes the mask in Hello when changing the mask of the IP address in the configuration, but PFSense does not. I can change the mask on Mikrotik to /30 and see 255.255.255.252 in the Hello packet. Also in the case of pmtp-broadcast and just broadcast, packets with this mask are sent
sudo tcpdump -ntvi gif0 proto ospf and 'ip[21] == 1'
tcpdump: listening on gif0, link-type NULL (BSD loopback), capture size 262144 bytes
----- Network Type: broadcast -----
IP (tos 0x0, ttl 1, id 16341, offset 0, flags [DF], proto OSPF (89), length 64)
192.168.33.2 > 224.0.0.5: OSPFv2, Hello, length 44
Router-ID 192.168.40.1, Backbone Area, Authentication Type: none (0)
Options [External]
Hello Timer 10s, Dead Timer 40s, Mask 255.255.255.252, Priority 192
----- Network Type: ptmp-broadcast -----
IP (tos 0x0, ttl 1, id 16563, offset 0, flags [DF], proto OSPF (89), length 64)
192.168.33.2 > 224.0.0.5: OSPFv2, Hello, length 44
Router-ID 192.168.40.1, Backbone Area, Authentication Type: none (0)
Options [External]
Hello Timer 10s, Dead Timer 40s, Mask 255.255.255.252, Priority 192
----- Network Type: ptp -----
IP (tos 0x0, ttl 1, id 16673, offset 0, flags [DF], proto OSPF (89), length 64)
192.168.33.2 > 224.0.0.5: OSPFv2, Hello, length 44
Router-ID 192.168.40.1, Backbone Area, Authentication Type: none (0)
Options [External]
Hello Timer 10s, Dead Timer 40s, Mask 255.255.255.252, Priority 192
I get that someone is convinced that this patch is good but in practice it does not seem to be so.
The OPNsense thread does not provide much detail apart from that it broke, and needed statics.
The lack of verbose logging of OSPF neighbor errors is well resolved: tcpdump -nvi gre0 proto ospf.
Can we hope that the patch will be tested and added to the upstream?
Update: I looked at this PR and while it seems to solve the problem it might not be the ideal solution. Ideally the OS should tell FRR if the interface is a tunnel/unnumbered instead of guesswork.
I talked with @ocochard and I was pointed out that this could be fixed/gone in later FreeBSD versions due move to netlink: https://reviews.freebsd.org/D36002
Meanwhile we don't have this or for older FreeBSDs version (and maybe other OSes), it was suggested that the more correct approach would be have in FRR a command to configure the expected borrowing behavior like Cisco/Juniper ip unnumbered.
I don't plan to work on this, but I though it'd be useful to let others know about this information to try to point to the right direction.
Meanwhile we don't have this or for older FreeBSDs version (and maybe other OSes), it was suggested that the more correct approach would be have in FRR a command to configure the expected borrowing behavior like Cisco/Juniper
ip unnumbered.
I concur. It's not so much the fact that FRR started marking undesired interfaces as unnumbered, it's the fact that FRR makes this decision at all. But while the system may have unnumbered interfaces, the OS shouldn't be guessing which ones they are either. No network OSes do this.
My PR was not an attempt to legitimise this mechanism by improving it, but merely to unf*k something I started seeing in FRR/FreeBSD at some point, and it does not tackle the bigger issue. An interface should only be deemed unnumbered if designated as such! Unnumbered interfaces aren't a trivial concept and it only makes things confusing to have them automatically appear like this.
[edit] Crucially, I think it is also incorrect to assume any association between tunnel interfaces and unnumbered interfaces.
This pull request has conflicts, please resolve those before we can evaluate the pull request.