OPNsense Netbird Plugin: Issues with Netbird traffic when OPNsense HA/CARP is Used
Describe the problem
My company and I are in the process of preparing to migrate ourselves and all of our clients redundant HA OPNsense firewalls off of the legacy OpenVPN connections to a WireGuard based solution. I have setup a new stack of OPNsense firewalls running the latest edition of OPNsense Business Edition (25.10_2) on my desk that I have been evaluating the newly released Netbird plugin with.
After significant testing and multiple different configurations, I have been unable to get NetBird's HA routing peers to properly route traffic when OPNsense is also in an HA configuration. The main issue is that Netbird is not aware of what OPNsense firewall is the primary firewall and instead chooses its "primary" peer as the peer that has been connected the longest. This leads to an asymmetric routing issue when OPNsense has unit 02 as the primary, but Netbird is still routing traffic through unit 01 or vice-versa.
I completed some testing of the OPNsense built in WireGuard service and it has an option called "Depend on (CARP)" that will ensure the tunnel is only functioning when the local OPNsense unit has the "Primary" status on the selected VIP. With this, changes to CARP status cause updates to WireGuard.
In my research of this issue, I discovered that this was a feature of the 0.1 release of the OPNsense NetBird plugin and then was later removed prior to the 1.0 release when the code base of the plugin was updated to match pfSense. Images of this and discussions about these changes can be reviewed at the following [pull request].(https://github.com/opnsense/plugins/pull/4531#issuecomment-3070404710)
As the plugin is configured currently, NetBird HA routing peers would never properly function and in fact would actually cause there to be a 50/50 chance that tunnels would not work.
To Reproduce
Steps to reproduce the behavior:
- Install two OPNsense firewalls using a CARP VIP on the WAN and LAN sides of the firewall. Let's identify this pair and their networks as T1 (Test 1) followed by 01 or 02 (T101 or T102).
- Install another OPNsense firewall (does not need to be an HA pair, but could be) with it its own unique WAN IP and unique LAN subnet. Let's identify this firewall and it's networks as T2 (Test 2) followed by 01(T201).
- Connect two test servers, one to T1 lan and one to T2 lan. These devices will only need to send pings.
- Configure Netbird on all firewalls and add the T1 peers to the same routing peer in the Netbird console.
- Disable the NetBird firewall within the NetBird plugin. The NetBird plugin does not respect the "Masquerade" flag in the NetBird cloud interface and always masquerades all traffic leaving a peer's Network Route.
- Create an Any:Any firewall rule on the LAN side interfaces, and the Netbird interfaces in OPNsense on both T1 and T2.
- Start a ping from the T2 test server to the T1 test server.
- Start a tcpdump on the wt0 and lan interfaces of all three firewalls filtering for ICMP.
- If things are working you should see, traffic will enter the lan interface on T201 and exit on the wt0 interface. The traffic will then appear again on the T101 firewalls wt0 interface and exit the firewall on the lan interface. The return traffic will follow the exact same path except in reverse.
- Now "Enter Persistent CARP Maintenance Mode" on T101. The pings will stop working. Reviewing the tcpdumps will now show traffic entering the lan interface on T201 and exit on the wt0 interface. The traffic will then appear again on the T101 firewalls wt0 interface and exit the firewall on the lan interface. The return traffic will now show up on the T102 firewall's lan interface as T102 is the current GW for the lan network. The traffic will leave T102's wt0 interface and will never appear on T201's wt0 interface as it appears NetBird/WireGuard is blocking the traffic. Even if NetBird/WireGaurd did not block this traffic this is still incorrect as the traffic should have been sent to T102 and not T101 while T101 is in maintenance mode. The only way to force NetBird to start routing traffic to the peer on T102 is by restarting the NetBird service on T101.
Note: This bug has also been reported on the OPNsense GitHub but has seen no interaction in 3 weeks. https://github.com/opnsense/plugins/issues/5023. Posting here for visibility to Netbird staff.
Expected behavior
When the primary OPNsense unit is placed into "Persistent CARP Maintenance Mode" and control of all the CARP IP addresses are transferred to the secondary unit, Netbird should also transfer routing of traffic through the tunnel of the secondary unit.
Are you using NetBird Cloud?
This testing is being completed with Netbird Cloud
NetBird version
OPNsense Business Version 25.10_2 with Netbird Plugin 1.1
Is any other VPN software installed?
Only the preinstalled ones that come with OPNsense, none are configured or running though.
Debug output
To help us resolve the problem, please attach the following anonymized status output
netbird status -dA
Peers detail:
int2gw01.netbird.cloud:
NetBird IP: 172.26.129.135
Public key: 3JQnseLxGtH8FkeuRn+TGDITMogQ9rni1L7uEAccrx0=
Status: Connected
-- detail --
Connection type: P2P
ICE candidate (Local/Remote): host/host
ICE candidate endpoints (Local/Remote): 172.26.161.1:51820/198.51.100.0:51820
Relay server address: rels://streamline-us-chi1-1.relay.netbird.io:443
Last connection update: 11 minutes, 47 seconds ago
Last WireGuard handshake: 1 minute, 23 seconds ago
Transfer status (received/sent) 1.3 KiB/1.9 KiB
Quantum resistance: false
Networks: -
Latency: 6.737969ms
int1gw02.netbird.cloud:
NetBird IP: 172.26.135.125
Public key: m4GZuZuYUNJSRmyxnQISiGlWtjJBDEg2spI8ptAkGy8=
Status: Connected
-- detail --
Connection type: P2P
ICE candidate (Local/Remote): host/host
ICE candidate endpoints (Local/Remote): 192.168.1.1:51820/172.26.160.4:51820
Relay server address: rels://streamline-us-chi1-1.relay.netbird.io:443
Last connection update: 14 minutes, 4 seconds ago
Last WireGuard handshake: 2 minutes, 24 seconds ago
Transfer status (received/sent) 1.6 KiB/1.8 KiB
Quantum resistance: false
Networks: -
Latency: 1.224949ms
inrmnb06.netbird.cloud:
NetBird IP: 172.26.136.111
Public key: GW20wxJ9ubMY2oxt1+8XKClp0jmr7Ay/WhYhLF5LVkQ=
Status: Connecting
-- detail --
Connection type: P2P
ICE candidate (Local/Remote): -/-
ICE candidate endpoints (Local/Remote): -/-
Relay server address:
Last connection update: 2 hours, 24 minutes ago
Last WireGuard handshake: -
Transfer status (received/sent) 0 B/0 B
Quantum resistance: false
Networks: -
Latency: 52.8643ms
int2gw02.netbird.cloud:
NetBird IP: 172.26.137.200
Public key: KQHLDlO8fEg/LLDAslKc0EhBzFts4HkPFq3pX5s4GAw=
Status: Connected
-- detail --
Connection type: P2P
ICE candidate (Local/Remote): host/host
ICE candidate endpoints (Local/Remote): 172.26.168.1:51820/198.51.100.1:51820
Relay server address: rels://streamline-us-chi1-1.relay.netbird.io:443
Last connection update: 11 minutes, 47 seconds ago
Last WireGuard handshake: 2 minutes, 13 seconds ago
Transfer status (received/sent) 1.4 KiB/1.3 KiB
Quantum resistance: false
Networks: -
Latency: 325.82369ms
Events:
[INFO] SYSTEM (3104151b-7191-47b3-8c12-abc7e63a3315)
Message: Network map updated
Time: 14 minutes, 4 seconds ago
[INFO] SYSTEM (75682c99-42c5-4d07-aa98-f48f723dea23)
Message: Network map updated
Time: 14 minutes, 4 seconds ago
[INFO] SYSTEM (28966348-5b85-4256-8487-2b748f1b41de)
Message: Network map updated
Time: 13 minutes, 51 seconds ago
[INFO] SYSTEM (17c3ad14-f624-4614-8edc-a73f68158b94)
Message: Network map updated
Time: 13 minutes, 51 seconds ago
[INFO] SYSTEM (e48fbbc6-c46d-4f70-8dec-9fbd33bbb32d)
Message: Network map updated
Time: 12 minutes, 54 seconds ago
[INFO] SYSTEM (5a7ea490-33d4-4964-93e2-6c3c10e042d3)
Message: Network map updated
Time: 12 minutes, 54 seconds ago
[INFO] SYSTEM (ac515038-7c8f-41f3-906e-09458ef8675b)
Message: Network map updated
Time: 12 minutes, 43 seconds ago
[INFO] SYSTEM (46424df7-3ef5-46b5-9ee7-bd9e6ff414d7)
Message: Network map updated
Time: 12 minutes, 43 seconds ago
[INFO] SYSTEM (d875f74f-c5af-438a-83cb-bfbc41df0c47)
Message: Network map updated
Time: 11 minutes, 48 seconds ago
[INFO] SYSTEM (2225eced-337f-4919-bc59-376432f83ac5)
Message: Network map updated
Time: 11 minutes, 42 seconds ago
OS: freebsd/amd64
Daemon version: 0.59.1
CLI version: 0.59.1
Profile: default
Management: Connected to https://api.netbird.io:443
Signal: Connected to https://signal.netbird.io:443
Relays:
[stun:stun.netbird.io:443] is Available
[stun:stun.netbird.io:5555] is Available
[turns:turn.netbird.io:443?transport=tcp] is Available
[rels://streamline-us-chi1-1.relay.netbird.io:443] is Available
Nameservers:
[172.26.160.3:53, 172.26.160.4:53] for [t1.anon-MlctE.domain, 160.26.172.anon-QZ7Pb.domain, 161.26.172.anon-QZ7Pb.domain, 162.26.172.anon-QZ7Pb.domain, 163.26.172.anon-QZ7Pb.domain, 164.26.172.anon-QZ7Pb.domain, 165.26.172.anon-QZ7Pb.domain, 166.26.172.anon-QZ7Pb.domain, 167.26.172.anon-QZ7Pb.domain, 168.26.172.anon-QZ7Pb.domain, 169.26.172.anon-QZ7Pb.domain, 170.26.172.anon-QZ7Pb.domain, 171.26.172.anon-QZ7Pb.domain, 172.26.172.anon-QZ7Pb.domain, 173.26.172.anon-QZ7Pb.domain, 174.26.172.anon-QZ7Pb.domain, 175.26.172.anon-QZ7Pb.domain] is Available
FQDN: int1gw01.netbird.cloud
NetBird IP: 172.26.135.211/20
Interface type: Userspace
Quantum resistance: false
Lazy connection: false
Networks: 172.26.160.0/20
Forwarding rules: 0
Peers count: 3/4 Connected
Additional context
Add any other context about the problem here.
Have you tried these troubleshooting steps?
- [X] Reviewed client troubleshooting (if applicable)
- [X] Checked for newer NetBird versions
- [X] Searched for similar issues on GitHub (including closed ones)
- [X] Restarted the NetBird client
- [X] Disabled other VPN software
- [X] Checked firewall settings