netbird icon indicating copy to clipboard operation
netbird copied to clipboard

Unable to P2P between peer and exit node

Open eveyraud opened this issue 5 months ago • 6 comments

Describe the problem

After disabling Windows Firewall and EDR, I am currently not able to connect my Windows and Linux peers in P2P together (only relayed connection). When I create a Linux VM in the same network than my Windows, the two can connect perfectly in P2P. I saw that NetBird wasn't UPnP and I created a rule to accept inbound and outbound connection on the UDP/51820 port (and even disabled FW as I said)

To Reproduce

Steps to reproduce the behavior:

  1. Create self-hosted NetBird server
  2. Connect a Linux peer
  3. Connect a Windows peer
  4. Authorize the two to communicate
  5. Disable Windows Firewalls and EDR
  6. Type command netbird status --detail
  7. Look at Connection Type

Expected behavior

Connection to be P2P

Are you using NetBird Cloud?

I am using self-hosted NetBird

NetBird version

Linux : 0.49.0, Windows: 0.49.0

Is any other VPN software installed?

Yes, forticlient but it is disabled.

Debug output

To help us resolve the problem, please attach the following anonymized status output

Peers detail: netbird-gateway.netbird.selfhosted: NetBird IP: 100.71.125.247 Public key: bGGIji458wjUPLylAqBgv7+bIN8UDa/Ea3viipngPXE= Status: Connected -- detail -- Connection type: Relayed ICE candidate (Local/Remote): -/- ICE candidate endpoints (Local/Remote): -/- Relay server address: rels://wg.anon-hcS0Z.domain:443 Last connection update: 3 minutes, 21 seconds ago Last WireGuard handshake: 1 minute, 22 seconds ago Transfer status (received/sent) 2.7 MiB/942.3 KiB Quantum resistance: false Networks: 0.0.0.0/0 Latency: 0s

Events: [INFO] NETWORK (71cf2845-907b-45ea-a4ce-a0df5fec2b09) Message: Default route added Time: 5 minutes, 45 seconds ago Metadata: id: ALL, network: 0.0.0.0/0, peer: bGGIji458wjUPLylAqBgv7+bIN8UDa/Ea3viipngPXE= [INFO] SYSTEM (7ea9ca82-02a3-4826-b6a8-9c786d395be5) Message: Network map updated Time: 5 minutes, 45 seconds ago [WARNING] DNS (2a772dbc-2669-4485-a193-7c6b89e6d583) Message: All upstream servers failed (probe failed) Time: 5 minutes, 28 seconds ago Metadata: upstreams: 172.16.1.6:53, 172.16.1.7:53 [INFO] SYSTEM (2a49d777-679d-47de-9a87-fa688c491d08) Message: Network map updated Time: 5 minutes, 28 seconds ago [INFO] NETWORK (4b72de14-7015-4bb4-968c-957d581240f1) Message: Default route added Time: 5 minutes, 28 seconds ago Metadata: id: ALL, network: 0.0.0.0/0, peer: bGGIji458wjUPLylAqBgv7+bIN8UDa/Ea3viipngPXE= [INFO] NETWORK (081172e9-65b9-4372-a371-08dcb7780e5a) Message: Default route added Time: 5 minutes, 28 seconds ago Metadata: id: ALL, network: 0.0.0.0/0, peer: bGGIji458wjUPLylAqBgv7+bIN8UDa/Ea3viipngPXE= [WARNING] DNS (c492914c-178f-4474-936e-c64a6ca8d278) Message: All upstream servers failed (probe failed) Time: 3 minutes, 21 seconds ago Metadata: upstreams: 172.16.1.6:53, 172.16.1.7:53 [INFO] SYSTEM (ba935ef3-8259-4dcf-9f21-d0e24d34e1d7) Message: Network map updated Time: 3 minutes, 21 seconds ago [INFO] NETWORK (1c786484-290f-42a7-b298-0119781d7c35) Message: Default route added Time: 3 minutes, 21 seconds ago Metadata: id: ALL, network: 0.0.0.0/0, peer: bGGIji458wjUPLylAqBgv7+bIN8UDa/Ea3viipngPXE= [INFO] NETWORK (cac1d46e-7915-4ffc-a282-e96ad2d2beba) Message: Default route added Time: 3 minutes, 21 seconds ago Metadata: id: ALL, network: 0.0.0.0/0, peer: bGGIji458wjUPLylAqBgv7+bIN8UDa/Ea3viipngPXE= OS: windows/amd64 Daemon version: 0.49.0 CLI version: 0.49.0 Management: Connected to https://wg.anon-hcS0Z.domain:443 Signal: Connected to https://wg.anon-hcS0Z.domain:443 Relays: [stun:wg.anon-hcS0Z.domain:3478] is Available [turn:wg.anon-hcS0Z.domain:3478?transport=udp] is Available [rels://wg.anon-hcS0Z.domain:443] is Available Nameservers: [172.16.1.6:53, 172.16.1.7:53] for [.] is Available FQDN: prt35.netbird.selfhosted NetBird IP: 100.71.104.14/16 Interface type: Userspace Quantum resistance: false Lazy connection: false Networks: - Forwarding rules: 0 Peers count: 1/1 Connected

Create and upload a debug bundle, and share the returned file key:

0107b729c324c4562e1e10d3b5be55567aa55c08302d40e822c7a4ca939cd561/acb0b152-7f5a-4853-b36c-e846e625a513

Have you tried these troubleshooting steps?

  • [X] Reviewed client troubleshooting (if applicable)
  • [X] Checked for newer NetBird versions
  • [X] Searched for similar issues on GitHub (including closed ones)
  • [X] Restarted the NetBird client
  • [X] Disabled other VPN software
  • [X] Checked firewall settings

Update The linux machine has the role of exit node for my Windows, when this role is removed from it, I am able to do P2P again.

eveyraud avatar Jun 25 '25 12:06 eveyraud

Have one of the peer allow inbound UDP port 51820

Silex avatar Jun 25 '25 13:06 Silex

Have one of the peer allow inbound UDP port 51820

Thanks for your reply, Both of my peers allows UDP/51820 inbound and outbound. As I said in my ticket, I even tried to disable the firewall of my Windows and the EDR but the problem keeps unchanged. Regarding my Linux peer, it has been able to P2P with another Linux and an Android, only Windows isn't working.

eveyraud avatar Jun 25 '25 13:06 eveyraud

UPDATE When I try to connect both of them on a cloud netbird, they can both connect in P2P. This issue seems to only concern self-hosted instances

eveyraud avatar Jun 25 '25 14:06 eveyraud

It's maybe related to the issue that I've raised today: https://github.com/netbirdio/netbird/issues/4045 Try to downgrade the Windows client to 0.48.0

SasSam avatar Jun 25 '25 18:06 SasSam

It's maybe related to the issue that I've raised today: https://github.com/netbirdio/netbird/issues/4045 Try to downgrade the Windows client to 0.48.0

Unfortunatly, no, I had the issue on 0.47.2, I then upgraded to 0.48.0, still had it and then finally upgraded to 0.49.0

eveyraud avatar Jun 26 '25 06:06 eveyraud

Have one of the peer allow inbound UDP port 51820

After reading some more docs I found this: https://netbird.io/knowledge-hub/enhancing-network-visibility-with-traffic-events-logging

I then tried hopelessly to remove the Linux his role of exit node for the Windows peer, now I'm able to P2P between the two machines. Now my questions are: - Is that a normal behavior ? - Is it possible to use UDP in such a context ?

eveyraud avatar Jun 26 '25 13:06 eveyraud

I did some tests to compare how traffic behaves when using a traditional WireGuard client versus the NetBird client while routing all traffic through an exit node:

🔹 Test 1: Traditional WireGuard client

When connecting to a WireGuard server:

Destination       Gateway         Genmask         Flags Metric Ref Use Iface
0.0.0.0           10.46.27.186    0.0.0.0         UG    0      0     0 wlp42s0
10.46.27.0        0.0.0.0         255.255.255.0   U     0      0     0 wlp42s0
55.44.33.22       10.46.27.186    255.255.255.255 UGH   0      0     0 wlp42s0  <- WG server public IP

The default route (0.0.0.0/0) is set to go through the tunnel, and an explicit route is added for the WireGuard server’s public IP via the real interface, ensuring it is reachable outside the tunnel (avoiding loops or fallback to relayed mode).

🔹 Test 2: NetBird client using an exit node

With NetBird connected to an exit node:

Destination       Gateway         Genmask         Flags Metric Ref Use Iface
0.0.0.0           10.46.27.186    0.0.0.0         UG    0      0     0 wlp42s0
10.46.27.0        0.0.0.0         255.255.255.0   U     0      0     0 wlp42s0
100.88.0.0        0.0.0.0         255.255.0.0     U     0      0     0 wt0

The default route (0.0.0.0/0) is pushed through the NetBird tunnel (wt0), but no dedicated route is added for the exit node’s public IP. As a result, traffic to the exit node itself also goes through the tunnel interface, which leads to:

  • A P2P connection being established temporarily.
  • Then falling back to relayed mode a few seconds later.

This behavior is consistent and happens only when using the peer as an exit node.

I only run these tests on Linux; both peers are using NetBird v0.49.0.

lepazca avatar Jun 28 '25 17:06 lepazca

I just tested it on Windows and it behaved in the same way. The Netbird client does not add a static route for the public IP address of the exit node. I assume this is the only way to avoid interfering with the P2P connection between the client and the exit node. Please correct me if I am wrong.

lepazca avatar Jun 28 '25 22:06 lepazca

I just tested it on Windows and it behaved in the same way. The Netbird client does not add a static route for the public IP address of the exit node. I assume this is the only way to avoid interfering with the P2P connection between the client and the exit node. Please correct me if I am wrong.

I thought about it but tbh I didn't search further. You are absolutely right, that's what I notice too. I'm not sure this behavior is wanted by the devs but just in case, I created a feature request: #4069

eveyraud avatar Jun 30 '25 08:06 eveyraud

@lepazca thanks for the analysis! Turns out the dev team was disussing this some time in the past, but didn't pursue the implementation, we will definitely look at it sooner than later

nazarewk avatar Jun 30 '25 12:06 nazarewk

@lepazca actually after further clarification this (routing of IPs) is something we are doing either in a separate routing table (linux) or on-demand upon detecting outbound connection (most other systems) so it should not be an issue.

Would it be possible for you to provide a netbird debug for 1m -SU for the affected devices? We would like to investigate this further

nazarewk avatar Jun 30 '25 13:06 nazarewk

@eveyraud Could you also send us a debug bundle from netbird-gateway.netbird.selfhosted Peer? We have noticed a suspicious candidate from within NetBird networking range which should not be there and want to learn more about it:

2025-06-25T14:02:36+02:00 WARN client/iface/bind/udp_mux_universal.go:168: Address 100.71.125.247:51820 is part of the NetBird network 100.71.104.14/16, refusing to write

nazarewk avatar Jun 30 '25 13:06 nazarewk

@nazarewk Thanks for the clarification and for looking into this.

I've performed the tests as requested and uploaded the debug data.

Here is the upload file key:

1234567890ab27fb37c88b3b4be7011e22aa2e5ca6f38ffa9c4481884941f726/12345678-90ab-cdef-1234-567890abcdef

Let me know if you need anything else from my side.

lepazca avatar Jun 30 '25 14:06 lepazca

@eveyraud Could you also send us a debug bundle from netbird-gateway.netbird.selfhosted Peer? We have noticed a suspicious candidate from within NetBird networking range which should not be there and want to learn more about it:

2025-06-25T14:02:36+02:00 WARN client/iface/bind/udp_mux_universal.go:168: Address 100.71.125.247:51820 is part of the NetBird network 100.71.104.14/16, refusing to write

IP addresses changed as I removed and added back some machines, but the IP your pointing was the peer "netbird-gateway". Just in case it can help you, here are the files from both peers:

Upload file key from NetBird-Gateway: 0107b729c324c4562e1e10d3b5be55567aa55c08302d40e822c7a4ca939cd561/2c406be5-8009-4208-9db0-19c88ccf3c9d

Upload file key from Windows peer: 0107b729c324c4562e1e10d3b5be55567aa55c08302d40e822c7a4ca939cd561/0eb3d2b5-c8ed-418b-a468-8f8d92d46c91

file keys updated I gave wrong ones

eveyraud avatar Jun 30 '25 14:06 eveyraud

@eveyraud Could you also send us a debug bundle from netbird-gateway.netbird.selfhosted Peer? We have noticed a suspicious candidate from within NetBird networking range which should not be there and want to learn more about it:

2025-06-25T14:02:36+02:00 WARN client/iface/bind/udp_mux_universal.go:168: Address 100.71.125.247:51820 is part of the NetBird network 100.71.104.14/16, refusing to write

Could you please let me know if there have been any updates on this topic? Regarding the warning you mentioned, do you think it's something concerning, or is it simply log noise?

Thanks for your help and answers

eveyraud avatar Jul 15 '25 13:07 eveyraud

Ah, thanks for the detailed post, I just ran into the same issue during testing and was beating my head against a wall.

stevo11811 avatar Jul 15 '25 14:07 stevo11811

@eveyraud The part about trying to use NetBird IP as an ICE candidate is suspicious. We have found some additional information there, but we will need additional ICE logs (enabled separately) to get to the bottom of this issue. Could you enable PION logs on your Windows client and gather a debug bundle for 1 minute again? https://docs.netbird.io/how-to/troubleshooting-client#debugging-ice-connections

PS: It would help if the debug bundle wasn't anonymized for more easily determining the exact IP address <> domain correlations. The logs are only accessible by developers through the internal storage system and are deleted after 30 days if you have any concerns about it.

nazarewk avatar Jul 15 '25 20:07 nazarewk

@eveyraud The part about trying to use NetBird IP as an ICE candidate is suspicious. We have found some additional information there, but we will need additional ICE logs (enabled separately) to get to the bottom of this issue. Could you enable PION logs on your Windows client and gather a debug bundle for 1 minute again? https://docs.netbird.io/how-to/troubleshooting-client#debugging-ice-connections

PS: It would help if the debug bundle wasn't anonymized for more easily determining the exact IP address <> domain correlations. The logs are only accessible by developers through the internal storage system and are deleted after 30 days if you have any concerns about it.

Of course, I can give you the logs. However due to cybersecurity policy I am only able to provide you the anonymized version. Giving you non-anonymized IPs or domains just isn't something we can do.

Also, the doc you gave me concerns linux commands, I tried to manually set the environment variable: [Environment]::SetEnvironmentVariable("PIONS_LOG_DEBUG", "all", "Machine") [Environment]::SetEnvironmentVariable("NB_LOG_LEVEL", "debug", "Machine") netbird up -F --log-level debug > C:\Temp\netbird.log 2>&1

But the given logs seems not to be very different from those I gave you. Am I wrong somewhere between the commands used and the returned logs ?

eveyraud avatar Jul 16 '25 08:07 eveyraud

@eveyraud I have checked this myself and indeed it looks like the logs aren't redirected to the expected places (the ICE library uses different logger implementation), on a fresh Ubuntu instance I found the PION logs at /var/log/netbird/netbird.out. I'll try to find out how to access those in other environments and get back to you.

For the future, you don't really need to run NetBird in the foreground; just restarting a service should suffice.

nazarewk avatar Jul 17 '25 13:07 nazarewk

Hey folks, can you please test https://github.com/netbirdio/netbird/releases/tag/v0.51.2 and report whether the problem has been fixed for you.

lixmal avatar Jul 22 '25 11:07 lixmal

Hey folks, can you please test https://github.com/netbirdio/netbird/releases/tag/v0.51.2 and report whether the problem has been fixed for you.

Issue fixed on my side. Still have the log spam @nazarewk mentionned though. Thank you a lot for the fix !

eveyraud avatar Jul 22 '25 12:07 eveyraud

This is working for me! Thanks everyone.

stevo11811 avatar Jul 22 '25 17:07 stevo11811

I'm not sure if this is the correct place to ask this, but I am having this issue even on 0.51.2. Please let me know if I need to spin up my own thread. I have some details below in case I can write my issue here.

I have a test environment with an Ubuntu VM running Docker CE and the Netbird containers deployed following the advanced guide. I also deployed a container for the Netbird agent on the same host to act as an exit node that's using host networking.

In the prior version of 0.51.1 for all containers, I was able to get P2P connections between my Windows endpoint and the exit node. However, in the latest release of 0.51.2, I am now only able to get relayed connections. I have ensured all ports are accessible so I don't think it's a firewall or NAT issue.

Can I get some help?

maxideus85 avatar Jul 24 '25 17:07 maxideus85

@maxideus85

I'm not sure if this is the correct place to ask this, but I am having this issue even on 0.51.2. Please let me know if I need to spin up my own thread. I have some details below in case I can write my issue here

please do

lixmal avatar Jul 25 '25 17:07 lixmal

Hi, sorry for the delay in providing feedback.

I'm still experiencing the same issue even after updating to version 0.51.2. When I enable the exit node, the connection switches to relayed. I understand some users have reported this issue as resolved, but perhaps the root cause in my case is different.

As before I captured the logs in case they help identify the issue.

c04854b95ffd40870e1dff93227236c1cce3c359fdeb83899a046975941e66e7/fab17525-9dba-463d-988d-7e94f1ae548a

Thank you

lepazca avatar Jul 27 '25 18:07 lepazca

@lepazca, this is a Linux machine; Linux is not affected by this bug.

You have a block rule in the OUTPUT chain. Could that be the issue?

-A OUTPUT -j evpn.OUTPUT
[...]
-A evpn.OUTPUT -j evpn.a.100.blockAll
-A evpn.r.100.blockAll -j REJECT --reject-with icmp-port-unreachable

lixmal avatar Jul 27 '25 18:07 lixmal

Thank you for the follow-up.

These rules were added by ExpressVPN, which I had previously installed on the machine. I have now stopped the ExpressVPN service and confirmed that all related evpn.* chains have been removed. I also flushed all iptables rules to ensure a clean environment.

After that, I repeated the test using netbird debug with the same setup, and unfortunately the issue still persists. Here is the new debug log ID:

c04854b95ffd40870e1dff93227236c1cce3c359fdeb83899a046975941e66e7/6ad1b43d-ee50-46a4-a4e7-1b45178ac9d2

Thanks again for the guidance!

lepazca avatar Jul 27 '25 20:07 lepazca