plugins icon indicating copy to clipboard operation
plugins copied to clipboard

Add option to make WireGuard service depend on a specific CARP vhid

Open nzkiwi68 opened this issue 3 years ago • 11 comments

Important notices Before you add a new report, we ask you kindly to acknowledge the following: [X] I have read the contributing guide lines at https://github.com/opnsense/plugins/blob/master/CONTRIBUTING.md [X] I have searched the existing issues and I'm convinced that mine is new. [X] When the request is meant for an existing plugin, I've added its name to the title.

Is your feature request related to a problem? Please describe. WireGuard (WG) is stateless by design and cannot be set to bind to specific IP address, such as a CARP vhid for sending packets. This makes WG quite complex and difficult to use with multiWAN HA site to site VPNs and FRR with OSPF for routing. https://forum.opnsense.org/index.php?topic=24655.0

Describe the solution you'd like WireGuard needs to start and stop based on CARP. It needs an option in the WG package ; Enable CARP Failover (tickbox) Follow this CARP vhid (drop down box , user selects which CARP to follow, probably the LAN CARP)

Describe alternatives you've considered

  1. The alternative is very complex.
  2. HA firewall pairs for site to site VPN needs 8 tunnels (compared to 2)
  3. WG and FRR both need to be setup differently on the primary vs backup firewall at each site
  4. You cannot sync the FRR nor the WG package
  5. I'm not sure if it can actually be done, because you need the WG interfaces for the tunnels for OSPF
  6. What about firewall rules on each WG interface? More complexity on a HA firewall pair

Additional context If WG could be made to simply stop and start following a CARP vhid, then, the whole solution becomes viable and so much simpler.

  • FRR and WG both start and stop based on CARP master status.
  • WG is super fast to establish a VPN tunnel
  • Complexity is reduced
  • The config can be HA sync from primary to backup firewall
  • Interfaces (especially WG interfaces) are exactly the same on the primary and the backup firewall

nzkiwi68 avatar Sep 07 '21 23:09 nzkiwi68

I've also been struggling with this issue, just using a simple primary/backup router system -- if anything on the secondary tries to use wireguard or if you use keepalives it creates a handshake which disrupts the vpn. It's basically impossible to get any kind of reliable vpn setup with a HA system without this other than using a setup like described above.

We have three sites which are all connected together; with this feature it becomes very simple to amke it work, though there could be a short disruption when things switch over. Without this feature the best I've come up with is to set up a separate WG tunnel for each of the 6 total servers to each of the 4 servers at other data centers, then use policy-based routing and gateway groups and custom routing -- even this would likely take longer to switch over than just having WG able to automatically activate when the CARP VIP becomes "master" and deactivate otherwise. The FRR and IPSEC plugins both do this, so I assume it's possible.

taxilian avatar Sep 10 '21 15:09 taxilian

I've continued my testing for multi WAN with HA firewall and really to make WG properly ready is just the ability to stop and start WG based on CARP status.

Does the development team have any comments?

nzkiwi68 avatar Sep 19 '21 21:09 nzkiwi68

I can have a look how @fichtner did it with FRR.

mimugmail avatar Sep 20 '21 04:09 mimugmail

I have a customer running three sites HA pfSense that we are migrating to new HA OPNsense. Because it's not yet in production installed side by side, I'd be more than happy to help with testing.

If you're able to get WG to follow CARP, tell me how I can download the patch and I'll help test.

I'd much rather migrate direct to WG site to site tunnels than IPSEC.

Thanks!

nzkiwi68 avatar Sep 20 '21 05:09 nzkiwi68

I would like to advise to look at openbsd how carp is supposed to work with bgpd (https://man.openbsd.org/bgpd), stopping and starting routing services always leads to service interruptions. For OSPF we added this https://github.com/opnsense/plugins/issues/2091, most of it can be extended for other routing services but currently we don't have plans to introduce this ourselves.

AdSchellevis avatar Sep 20 '21 07:09 AdSchellevis

Thanks! I like the work in OSPF and I saw that ability to lower the cost for a route because of CARP demotion.

Starting and stopping WireGuard thought isn't much of a problem because the tunnel setup is just so darn fast! IPSEC takes ages but WG comes up so quickly.

So, if FRR and OSFP remained alive on the backup firewall but WG was off, as soon CARP transitioned and the backup became the CARP master, WG would start, packets would start flowing and routing would work.

In that way, FRR/OSPF would not need to be off on the backup firewall.

The problem with WG is that if the backup is running WG whilst the CARP backup, it ultimately interferes with the WG tunnel on the primary firewall.

Essentially we still need WG to be stopped on the backup firewall.

nzkiwi68 avatar Sep 20 '21 10:09 nzkiwi68

I see others also continue to run into issues trying to use Wireguard with CARP and HA and are reporting exactly the same issues I encountered. [https://forum.opnsense.org/index.php?topic=25993.0]

It looks like jrenken has written a syshook for CARP [https://gist.github.com/jprenken/18ca7bf14ddae547ae0fdf6f56d72573]

Is it possible to have this reviewed and get this baked into the wireguard plugin?

Wireguard does really need to be off on the CARP backup firewall.

nzkiwi68 avatar Mar 06 '22 18:03 nzkiwi68

This issue has been automatically timed-out (after 180 days of inactivity).

For more information about the policies for this repository, please read https://github.com/opnsense/plugins/blob/master/CONTRIBUTING.md for further details.

If someone wants to step up and work on this issue, just let us know, so we can reopen the issue and assign an owner to it.

OPNsense-bot avatar Mar 06 '22 23:03 OPNsense-bot

Please reopen :)

mimugmail avatar Mar 07 '22 04:03 mimugmail

@mimugmail In the forum-post (https://forum.opnsense.org/index.php?topic=25993.msg129864#msg129864) where jrenken posted his wg-carp hook (https://gist.github.com/jprenken/18ca7bf14ddae547ae0fdf6f56d72573) it turned out that this works best with wireguard-kmod instead of wireguard-go. If we include this carp-hook and switch from wireguard-go to wireguard-kmod, the problem would be solved, or am i wrong?

AndyX90 avatar Mar 20 '22 08:03 AndyX90

It only works with kmod. The problem is that author marks kernel module as highly devel

mimugmail avatar Mar 20 '22 08:03 mimugmail

This issue has been automatically timed-out (after 180 days of inactivity).

For more information about the policies for this repository, please read https://github.com/opnsense/plugins/blob/master/CONTRIBUTING.md for further details.

If someone wants to step up and work on this issue, just let us know, so we can reopen the issue and assign an owner to it.

OPNsense-bot avatar Jan 16 '23 11:01 OPNsense-bot

This is still one of the biggest issues with having a true highly available router -- if you use wireguard then there are all sorts of hacks you have to do to get it to mostly work most of the time, and it's still not reliable.

taxilian avatar Jan 16 '23 20:01 taxilian

I seem to have managed to finally get WireGuard a lot more reliable. I have forked jrenken's script and made a bunch of changes.

  • I follow a single interface for CARP status (I recommend following your LAN interface)
  • Updated log messages with a lot more information and the new log_msg format
  • Run wireguard start twice

It's now quite reliable.

https://gist.github.com/nzkiwi68/5b54aece233ff72ada395b5a1bdad92c

nzkiwi68 avatar Jan 16 '23 21:01 nzkiwi68

Agreed that there are ways to hack it to make it mostly work, at least if you're using kmod -- but there are still some pretty major downsides:

  • If the HA sync for wireguard is enabled then it will re-enable for CARP
  • You have to download and install the script yourself
  • The script isn't saved by backing up the firewall configuration

I'm not railing against the project for not having fixed this yet -- I'm happy to help with the solution if there is something I can do and I appreciate what has been done! But I definitely don't think this ticket should be closed. At the very least I'd like to see these changes made:

  • Make it possible to have it not enable wireguard on the secondary if it isn't master (or maybe just have it not sync that setting? that might do it)
  • Ideally there should be a way to tell it to only be active when a specific CARP interface is MASTER and have it automatically add this script -- or something similar. There are other packages which handle this elegantly, such as IPSEC, so I'm certain it can be done.

Again, not complaining or criticizing -- I just don't want to see this swept under the rug and treated like it isn't a big deal because it is a real problem which has a real impact on whether or not this part of the system is really enterprise/production ready.

taxilian avatar Jan 17 '23 20:01 taxilian