os-wireguard 2.3 - CARP - interface remain active
Important notices Before you add a new report, we ask you kindly to acknowledge the following:
- [x] I have read the contributing guide lines at https://github.com/opnsense/plugins/blob/master/CONTRIBUTING.md
- [x] I have searched the existing issues, open and closed, and I'm convinced that mine is new.
- [x] The title contains the plugin to which this issue belongs
Describe the bug After activating CARP on the wireguard interface, the backup shows that the interface is deactivated but actually still works and pings
To Reproduce Steps to reproduce the behavior:
- Enable carp for wg interface
- Check VPN: WireGuard: Diagnostics -> interface is down
- Try ping to WG interface IP -> OK
Expected behavior
- Enable carp for wg interface
- Try ping to WG interface IP -> No route to host
- Try ping any peer -> No route to host
Screenshots
Relevant log files
Additional context In the case of dynamic routing, it is then not possible to access the backup from the master, because the backup still wants to use its interface
Environment
OPNsense 23.7.6-amd64 os-wireguard 2.3
@AdSchellevis any updates? Do you need any further information?
it's marked community support, if at some point it turns out to be a bug, we can re-label.
It looks like a bug because the interface should be disabled, but it's not.
This pic is from backup host and wireguard interface is still enabled
Wireguard plugin version is 2.5 and the steps to reproduce are unclear. Please post to e exact GUI options used and attach ifconfig output.
@fichtner I updated to version 2.5 and it's still the same. I just created a WG set CARP on it, but the wireguard interface remains active on the backup, which breaks dynamic routing.
You haven't provided any ifconfig output on master and backup here. It's very difficult to diagnose on guessing.
Master:
wg1: flags=80c1<UP,RUNNING,NOARP,MULTICAST> metric 0 mtu 1420
options=80000<LINKSTATE>
inet 172.31.255.1 netmask 0xffffff00
groups: wg wireguard
nd6 options=109<PERFORMNUD,IFDISABLED,NO_DAD>
wg2: flags=80c1<UP,RUNNING,NOARP,MULTICAST> metric 0 mtu 1420
options=80000<LINKSTATE>
inet 172.16.6.1 netmask 0xfffffff0
groups: wg wireguard
nd6 options=109<PERFORMNUD,IFDISABLED,NO_DAD>
ping to peer on wg1:
root@gw1:~ # ping 172.31.255.20
PING 172.31.255.20 (172.31.255.20): 56 data bytes
64 bytes from 172.31.255.20: icmp_seq=0 ttl=64 time=3.258 ms
64 bytes from 172.31.255.20: icmp_seq=1 ttl=64 time=3.545 ms
64 bytes from 172.31.255.20: icmp_seq=2 ttl=64 time=3.751 ms
64 bytes from 172.31.255.20: icmp_seq=3 ttl=64 time=3.522 ms
64 bytes from 172.31.255.20: icmp_seq=4 ttl=64 time=3.817 ms
Backup: ifconfig
wg1: flags=8080<NOARP,MULTICAST> metric 0 mtu 1420
options=80000<LINKSTATE>
inet 172.31.255.1 netmask 0xffffff00
groups: wg wireguard
nd6 options=109<PERFORMNUD,IFDISABLED,NO_DAD>
wg2: flags=8080<NOARP,MULTICAST> metric 0 mtu 1420
options=80000<LINKSTATE>
inet 172.16.6.1 netmask 0xfffffff0
groups: wg wireguard
nd6 options=109<PERFORMNUD,IFDISABLED,NO_DAD>
ping to peer on wg1:
root@gw2:~ # ping 172.31.255.20
PING 172.31.255.20 (172.31.255.20): 56 data bytes
ping: sendto: No route to host
ping: sendto: No route to host
ping: sendto: No route to host
ping: sendto: No route to host
Routes for WG peers remain
I'm not a WireGuard expert but it looks like it's working? wg1 and wg2 are down on the backup.
As you can see here: wireguard interface is still visible via wg command (=somehow active). And routes for wireguard peers stay on backup
I fail to see the point to be honest :)
CARP should disable interface in wireguard, not just in OS interface. Then the WG command will not show the interface and the routes will disappear and dynamic routing will work fine
It disabled WireGuard instance fine. If you don't expect "wg" to show any info if the instance is down we don't build or touch "wg" command so you need to find someone who works on "wg".
You could also call this "hot standby" but what do I know ;)
When I ping 172.31.255.1 (wg1 IP) from the device, both the master and backup respond from their wg1 interface. Which is wrong and only the master should respond
Ping from the backup? Yes? And if you ping from the master?
When I disable interface manualy it dissapires from wg command and routes.
Yep, it's a hot standby and you haven't answered my question:
Ping from the backup? Yes? And if you ping from the master?
Ping from the backup? Yes? And if you ping from the master?
I'm not sure what you mean. I get ping replies from both, even if I disable dynamic routing on the backup, which withdraw route (To get rid of master route).
Ok, last time. You said:
When I ping 172.31.255.1 (wg1 IP) from the device.
What is "the" "device"? The backup firewall itself?
Yes backup and master itself.
Now "the device" is two devices? Backup AND master?
Yes, it was a poorly worded sentence: gw1 (master): ping 172.31.255.1 -> got response from local interface wg gw2 (backup): ping 172.31.255.1 -> got response from local interface wg
When manually disabled the wg interface on gw2: gw2 (backup): ping 172.31.255.1 -> response from gw1 via route added by dynamic routing.
Ok, this appears to be relevant: https://www.linuxquestions.org/questions/linux-kernel-70/ping-is-successful-even-the-interface-is-down-on-linux-box-4175597480/
"The interface is down to the outside world, but the kernel is still aware of it by IP address or by device name, and it is still configured. The request comes from "inside" so it responds. You would not be able to ping it from the other side however."
Yes, I know, but that's the problem. Wireguard doesn't have to be "hot-standby" and there's no point in using it that way. The time it takes to get from off to active is negligible and it's a more natural way
Ok, so what's the real world downside here? I feel like we are tiptoing around a use case indicated by:
In the case of dynamic routing, it is then not possible to access the backup from the master, because the backup still wants to use its interface
But that's part of the prerequisite in setup scope and omitted in the steps to reproduce. FRR running? How so? Where is the problem over there?
FRR is running and working as it should. It doesn't use CARP, it doesn't make sense to use it because when is BGP active it finds right way.
The whole problem is that you leave the wireguard activated and just disable the system interface which make this "weird" situations when you need to make NAT from master to backup because backup want to respond with local interface. Why you just can't disable wireguard as it should be? This problem immediately disapper
And FRR is not in steps to reproduce because it's not about it. It could be a static route or maybe someone has other scenery. The whole point of this problem is that in fact the wireguard remains active, yes "you can't access it" but it is still active and that doesn't make sense
@fichtner any update?
not likely as this sounds like a (common) setup problem, if routing prevent using a path, source nat usually helps.
At the risk of being flamed (I hope note) - I'll chime in and comment how I use WG and FRR on several customers, with multi WAN failover on some customers and just a single WAN on others.
Since WG changelog 2.4, WG has become great and works really, really well. I'm applying 2.6 tonight - that is OPNsense firmware 23.7.11 and the expected new benefit "consider missing CARP VHID as disabled" will also help in some situations too. I too have noticed that CARP disabled didn't stop WG and so it will be good to have that edge case closed too.
***** Highlights *****
FRR
- I use BGP for routing, I just find it more flexible and robust that OSPF
- "Enable CARP Failover" is NOT selected
- I have BFD enabled
WireGuard
- "Depend on (CARP)" is in use - I track the WAN interface (WAN1 or WAN2) as appropriate for multiWAN
- If just a single WAN, then, I track the LAN CARP for WG and not the WAN at all.
- "Disable routes" is selected
** My Comments ** Since the WG interfaces on the backup firewall when it's the CARP backup remain down, then the backup firewall cannot find it's BGP neighbor and thus the dynamic BGP routes do not get added to the backup firewall routing table.
Since FRR is running, as soon as the primary firewall disappears, the backup firewall becomes the CARP master and BGP on the backup firewall can suddenly find is neighbors and volia, routing starts.
I'm losing 1-2 pings during a transitions from PRIMARY firewall to BACKUP firewall - it's amazingly fast the transition - it really works!
I hope my comments help.