plugins icon indicating copy to clipboard operation
plugins copied to clipboard

os-netbird plugin not following OPNsense HA failover

Open myah-mitchell opened this issue 2 months ago • 2 comments

Important notices Before you add a new report, we ask you kindly to acknowledge the following:

  • [x] I have read the contributing guide lines at https://github.com/opnsense/plugins/blob/master/CONTRIBUTING.md
  • [x] I have searched the existing issues, open and closed, and I'm convinced that mine is new.
  • [x] The title contains the plugin to which this issue belongs

Describe the bug My company and I are in the process of preparing to migrate ourselves and all of our clients redundant HA OPNsense firewalls off of the legacy OpenVPN connections to a WireGuard based solution. I have setup a new stack of OPNsense firewalls running the latest edition of OPNsense Business Edition (25.10_2) on my desk that I have been evaluating the newly released Netbird plugin with.

After significant testing and multiple different configurations, I have been unable to get NetBird's HA routing peers to properly route traffic when OPNsense is also in an HA configuration. The main issue is that Netbird is not aware of what OPNsense firewall is the primary firewall and instead chooses its "primary" peer as the peer that has been connected the longest. This leads to an asymmetric routing issue when OPNsense has unit 02 as the primary, but Netbird is still routing traffic through unit 01 or vice-versa.

I completed some testing of the OPNsense built in WireGuard service and it has an option called "Depend on (CARP)" that will ensure the tunnel is only functioning when the local OPNsense unit has the "Primary" status on the selected VIP. With this, changes to CARP status cause updates to WireGuard.

In my research of this issue, I discovered that this was a feature of the 0.1 release of the OPNsense NetBird plugin and then was later removed prior to the 1.0 release when the code base of the plugin was updated to match pfSense. Images of this and discussions about these changes can be reviewed at the following [pull request].(https://github.com/opnsense/plugins/pull/4531#issuecomment-3070404710)

As the plugin is configured currently, NetBird HA routing peers would never properly function and in fact would actually cause there to be a 50/50 chance that tunnels would not work.

To Reproduce Steps to reproduce the behavior:

  1. Install two OPNsense firewalls using a CARP VIP on the WAN and LAN sides of the firewall. Let's identify this pair and their networks as T1 (Test 1) followed by 01 or 02 (T101 or T102).
  2. Install another OPNsense firewall (does not need to be an HA pair, but could be) with it its own unique WAN IP and unique LAN subnet. Let's identify this firewall and it's networks as T2 (Test 2) followed by 01(T201).
  3. Connect two test servers, one to T1 lan and one to T2 lan. These devices will only need to send pings.
  4. Configure Netbird on all firewalls and add the T1 peers to the same routing peer in the Netbird console.
  5. Disable the NetBird firewall within the NetBird plugin. The NetBird plugin does not respect the "Masquerade" flag in the NetBird cloud interface and always masquerades all traffic leaving a peer's Network Route.
  6. Create an Any:Any firewall rule on the LAN side interfaces, and the Netbird interfaces in OPNsense on both T1 and T2.
  7. Start a ping from the T2 test server to the T1 test server.
  8. Start a tcpdump on the wt0 and lan interfaces of all three firewalls filtering for ICMP.
  9. If things are working you should see, traffic will enter the lan interface on T201 and exit on the wt0 interface. The traffic will then appear again on the T101 firewalls wt0 interface and exit the firewall on the lan interface. The return traffic will follow the exact same path except in reverse.
  10. Now "Enter Persistent CARP Maintenance Mode" on T101. The pings will stop working. Reviewing the tcpdumps will now show traffic entering the lan interface on T201 and exit on the wt0 interface. The traffic will then appear again on the T101 firewalls wt0 interface and exit the firewall on the lan interface. The return traffic will now show up on the T102 firewall's lan interface as T102 is the current GW for the lan network. The traffic will leave T102's wt0 interface and will never appear on T201's wt0 interface as it appears NetBird/WireGuard is blocking the traffic. Even if NetBird/WireGaurd did not block this traffic this is still incorrect as the traffic should have been sent to T102 and not T101 while T101 is in maintenance mode. The only way to force NetBird to start routing traffic to the peer on T102 is by restarting the NetBird service on T101.

Expected behavior

When the primary OPNsense unit is placed into "Persistent CARP Maintenance Mode" and control of all the CARP IP addresses are transferred to the secondary unit, Netbird should also transfer routing of traffic through the tunnel of the secondary unit.

Environment

Type opnsense-business Version 25.10_2 Netbird Plugin 1.1

myah-mitchell avatar Nov 12 '25 21:11 myah-mitchell

Oh additionally, the OPNsense documentation for the NetBird plugin, located here, still references the old CARP settings of the 0.1 version of the plugin. I don't know where to post/notify about that.

myah-mitchell avatar Nov 12 '25 21:11 myah-mitchell

This has been cross posted to the Netbird GitHub to gain visibility of Netbird staff as they had a part in the creation of the Netbird plugin under OPNsense and other issues with this plugin have been addressed there as well. https://github.com/netbirdio/netbird/issues/4898

myah-mitchell avatar Dec 02 '25 16:12 myah-mitchell