netbird icon indicating copy to clipboard operation
netbird copied to clipboard

Unable to ping other peers on netbird network

Open bmcgonag opened this issue 1 year ago • 15 comments

Describe the problem

I have setup a netbird self-hosted network with Authentik as the IdP.

I have added two linux devices and one iphone.

I try to ping from one linux machine to the other on it's netbird ip address.

netbird status -d on each linux machine shows the other as a peer, as well as the iphone as a peer that is currently offline.

I saw some other posts about similar issues where the person found their turn server config to be incorrect.

I used the site at https://webrtc.github.io/samples/src/content/peerconnection/trickle-ice/ to test my turn configuration and get the following:

Time 	Type 	Foundation 	Protocol 	Address 	Port 	Priority 	URL (if present) 	relayProtocol (if present)
0.003	host	0	udp	dfaa8882-bbcf-61a7263e2e3c.local	40208	126 | 32512 | 255		
0.008	host	3	udp	2b841932-ae5b-3d03e55d8a5b.local	49310	126 | 32256 | 255		
0.008	host	6	tcp	dfaa8882-288f-bbcf-61a7263e2e3c.local	9	125 | 32704 | 255		
0.009	host	7	tcp	2b841932-ae5b-3d03e55d8a5b.local	9	125 | 32448 | 255		
0.010	host	0	udp	dfaa8882-61a7263e2e3c.local	43180	126 | 32512 | 254		
0.012	host	3	udp	2b841932-ae5b-3d03e55d8a5b.local	51760	126 | 32256 | 254		
0.013	host	6	tcp	dfaa8882-288f-61a7263e2e3c.local	9	125 | 32704 | 254		
0.014	host	7	tcp	2b841932-62b0-3d03e55d8a5b.local	9	125 | 32448 | 254		
0.140	srflx	4	udp	xx.xxx.xx.xxx	49310	100 | 32287 | 255		
0.141	relay	5	udp	xxx.xxx.xxx.xxx	63425	5 | 32287 | 255		
0.166	Done

I believe everything is setup correctly, but still I am unable to ping the other machine successfully.

In the management.json file I also verified that the turn server credentials match those in the turnserver.conf file.

I have setup 1 extra group called personal, and added all three machines to it. I added an ACL for that group to allow traffic between the machines in the group and made sure it's enabled. Additionally, I have not removed the 'ALL' group, just to be able to compare and contrast having ALL enabled or disabled. No difference.

To Reproduce

Steps to reproduce the behavior:

  1. Setup Netbird on a self hosted installation.
  2. Set it up to use Authentik (I don't think this is the issue)
  3. install netbird clients on 2 linux machines.
  4. Add the machines to a group.
  5. create an ACL to allow the machines in the group to communicate.
  6. enable the ACL.
  7. try to ping one machine from the other.

Expected behavior

I would expect communication between the machines in a group being handled by an ACL to allow communication. At the very least I would expect the machines on the ALL group to be able to communicate.

Are you using NetBird Cloud?

Self-hosted

NetBird version

Server: Docker - version set to latest Clients: Linux Desktops - Fedora 39 - 0.25.4 Linux Dekstops - Ubuntu 23.10 - 0.25.5

NetBird status -d output: From the Fedora desktop:

Peers detail:
 brian-ub-studio-1.netbird.selfhosted:
  NetBird IP: 100.85.93.103
  Public key: ***************************************
  Status: Connected
  -- detail --
  Connection type: P2P
  Direct: true
  ICE candidate (Local/Remote): host/prflx
  Last connection update: 2024-01-29 14:14:21

 iphone.netbird.selfhosted:
  NetBird IP: 100.85.170.165
  Public key: ***************************************
  Status: Disconnected
  -- detail --
  Connection type: 
  Direct: false
  ICE candidate (Local/Remote): -/-
  Last connection update: 2024-01-29 14:53:52

Daemon version: 0.25.4
CLI version: 0.25.4
Management: Connected to https://my-net.netbird-server.com:33073
Signal: Connected to http://my-net.netbird-server.
![Screenshot from 2024-01-29 19-07-24](https://github.com/netbirdio/netbird/assets/7346620/72851509-a62d-4c9e-8b98-e4673ac52e32)
![Screenshot from 2024-01-29 19-07-44](https://github.com/netbirdio/netbird/assets/7346620/4f19af0a-5a04-41bb-b87f-753968684a23)
com:10000
FQDN: brian-fedora-lan-1.netbird.selfhosted
NetBird IP: 100.85.242.220/16
Interface type: Kernel
Peers count: 1/2 Connected

If applicable, add the `netbird status -d' command output.

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

bmcgonag avatar Jan 30 '24 01:01 bmcgonag

Additional information. I updated my fedora client to 0.25.5-1, and still see the same issue.

bmcgonag avatar Jan 30 '24 18:01 bmcgonag

I completely remade my setup using a new domain name, and still have the following:

  1. Each peer can see the other peers in the all group when doing netbird status -d.
  2. One peer can resolve the ipv4 address of the other peer when trying to ping by netbird hostname, but no ping is ever successful.
  3. Neither peer can successfully ping the other by ipv4 or hostname.

Watched the logs of the docker compose up -d when I started up the new system. Had no errors at all.

Everything appears to be communicating properly, except the clients can't seem to communicate with each other. No idea why.

  • Clients can ping each other on local network by local LAN ip address.
  • No firewall is on for either client on the local network.
  • Disabled firewall on VPS to make sure that wasn't it, and no change.

Any help is greatly appreciated.

bmcgonag avatar Jan 31 '24 17:01 bmcgonag

Hello @bmcgonag, please confirm the VPC you are using to host NetBird server. Might be an issue with reachability to Coturn. Are you able to ping different hosts on same network using their NetBird hostname?

wisetux avatar Feb 01 '24 04:02 wisetux

I'm using Digital Ocean. I posted my coturn test results in the original message using Trickle-ice. I don't think that's the issue, but not 100% ceertain of that.

I am unable to ping the hosts by IPv4 or by Hostname. Any direction or help is greatly appreciated.

bmcgonag avatar Feb 01 '24 13:02 bmcgonag

also @wisetux the server I have setup is 1vCPU and 2GB RAM running Ubuntu 22.04 LTS server. Nothing else running on that server, just Netbird.

bmcgonag avatar Feb 02 '24 12:02 bmcgonag

Thank you for the info. The server specs should be fine as NetBird is very light on resources. However Trickle ICE output looks a little different. This is what I have:

Time	Type	Foundation	Protocol	Address	                                  Port	Priority	          URL (if present)	                       relayProtocol (if present)
0.007	host	4226889391	udp	        CLIENT LAN IP ADDRESS                     35803	126 | 32286 | 255
0.010	host	2058046622	udp       	CLIENT ISP IPV6 ADDRESS	                  49572	126 | 32552 | 255
0.051	srflx	1125687685	udp      	CLIENT ISP IPV4 ADDRESS	                  35803	100 | 32286 | 255	  stun:netbird.DOMAIN.com:3478
0.098	relay	2009379810	udp      	NETBIRD SERVER LAN IP ADDRESS             55349	  2 | 32287 | 255	  turn:netbird.DOMAIN.com:3478?transport=udp   udp
0.121	host	2235487287	tcp       	CLIENT LAN IP ADDRESS                         9	 90 | 32286 | 255
0.122	host	73707014	tcp     	CLIENT ISP IPV6 ADDRESS	                      9	 90 | 32552 | 255
0.124	Done

Can you try connecting from a different network or a mobile Hotspot maybe?

wisetux avatar Feb 02 '24 23:02 wisetux

Ok, yeah I see how your's is different. Any idea what it might be @wisetux ? I have my own Coturn server setup that I use for Matrix, NextCloud, and others, but it uses "static-auth" not "lt-cred-mesh". Can Netbird do "static-auth"?

bmcgonag avatar Feb 03 '24 01:02 bmcgonag

Results when connected through mobile hotspot

Time 	Type 	Foundation 	Protocol 	Address 	Port 	Priority 	URL (if present) 	relayProtocol (if present)
0.007	host	0	udp	2b0139aa-xxxx-425c-bfe2-fad07cf3f11a.local	59790	126 | 32256 | 255		
0.010	host	3	udp	dff059db-xxxx-4ecc-b1fe-48d649fd858e.local	58453	126 | 32512 | 255		
0.012	host	6	tcp	2b0139aa-149e-xxxx-bfe2-fad07cf3f11a.local	9	125 | 32448 | 255		
0.013	host	7	tcp	dff059db-9369-xxxx-b1fe-48d649fd858e.local	9	125 | 32704 | 255		
0.018	host	0	udp	2b0139aa-xxxx-425c-bfe2-fad07cf3f11a.local	50283	126 | 32256 | 254		
0.019	host	3	udp	dff059db-xxxx-4ecc-b1fe-48d649fd858e.local	57575	126 | 32512 | 254		
0.020	host	6	tcp	2b0139aa-xxxx-425c-bfe2-fad07cf3f11a.local	9	125 | 32448 | 254		
0.022	host	7	tcp	dff059db-xxxx-4ecc-b1fe-48d649fd858e.local	9	125 | 32704 | 254		
0.173	srflx	1	udp	174.2xx.xxx.xxx	4351	100 | 32287 | 255		
0.173	relay	2	udp	206.xx.xx.xxx	61919	5 | 32287 | 255	
0.194	Done

bmcgonag avatar Feb 03 '24 14:02 bmcgonag

I'm not well versed with Coturn server setup and I use a dedicated instance just for Netbird. Maybe this issue might give you more info regarding static-auth configuration: https://github.com/netbirdio/netbird/issues/569

wisetux avatar Feb 03 '24 18:02 wisetux

Do you have DNS resolv issue ? Maybe one of the ERROR below in /var/log/netbird/client.log

ERRO client/internal/dns/server.go:282: got an error while applying resolvconf configuration for wt0 interface, error: exit status 99
ERRO client/internal/dns/host_linux.go:99: got an error while checking systemd resolv conf mode, error: got an error getting property org.freedesktop.resolve1.Manager.ResolvConfMode: Unknown property or interface.
WARN client/internal/wgproxy/factory_linux.go:15: failed to initialize ebpf proxy, fallback to user space proxy: field NbXdpProg: program nb_xdp_prog: map .rodata: map create: read- and write-only maps not supported (requires >= v5.2)
ERRO client/internal/dns/server.go:282: unable to configure DNS for this peer using resolvconf manager without a nameserver group with all domains configured

full issue refered here: #1451

magixus avatar Feb 06 '24 15:02 magixus

No. Checked logs, and no errors shown. I have a few WARN, and a lot of INFO states, but no ERRORs logged.

Even tailed the logs while logging in, as well as trying to ping the peer after login.

bmcgonag avatar Feb 09 '24 01:02 bmcgonag

Same sympton here: root@docker219 ~# netbird status -d Peers detail: pve.netbird.selfhosted: NetBird IP: 100.86.4.26 Public key: dIuwdZzyZpSQPx64I7wo8uzl/su75PaNpklHVhZFkCw= Status: Connected -- detail -- Connection type: Relayed Direct: false ICE candidate (Local/Remote): relay/host ICE candidate endpoints (Local/Remote): 114.37.176.127:61298/192.168.1.2:61298 Last connection update: 2024-02-19 14:09:49 Last Wireguard handshake: 2024-02-19 14:32:53 Transfer status (received/sent) 1.1 KiB/3.7 KiB

d9e3486ac0e6.netbird.selfhosted: NetBird IP: 100.86.24.123 Public key: kMnnFpG4JtOASFHcGO3otQxKFAJQ7lDK1iNpkp9TOyo= Status: Disconnected -- detail -- Connection type: Relayed Direct: false ICE candidate (Local/Remote): relay/host ICE candidate endpoints (Local/Remote): 114.37.176.127:57500/192.168.1.236:57500 Last connection update: - Last Wireguard handshake: 2024-02-19 14:32:00 Transfer status (received/sent) 1.4 KiB/640 B

desktop-0d03977.netbird.selfhosted: NetBird IP: 100.86.71.168 Public key: BI3zBLxEDNOTo/ouFcrfx+nU8PAbfueTRWfPyUFgFEk= Status: Connected -- detail -- Connection type: P2P Direct: true ICE candidate (Local/Remote): host/srflx ICE candidate endpoints (Local/Remote): 192.168.10.219:51820/118.163.170.24:51820 Last connection update: 2024-02-19 14:09:49 Last Wireguard handshake: 2024-02-19 14:33:20 Transfer status (received/sent) 2.9 KiB/2.1 KiB

netbird.netbird.selfhosted: NetBird IP: 100.86.138.236 Public key: hgOPbz+D5cSiOmIdLbyjzMT85sojs8hGfe8r33/tYTY= Status: Connected -- detail -- Connection type: Relayed Direct: false ICE candidate (Local/Remote): relay/host ICE candidate endpoints (Local/Remote): 114.37.176.127:57500/192.168.1.236:57500 Last connection update: 2024-02-19 14:25:15 Last Wireguard handshake: 2024-02-19 14:32:00 Transfer status (received/sent) 1.4 KiB/640 B

pve-dell.netbird.selfhosted: NetBird IP: 100.86.139.76 Public key: s4KxhTaOhrZgrvi2WDeHDKwIRg2YmeBoNjNGOxrkeyE= Status: Connected -- detail -- Connection type: P2P Direct: true ICE candidate (Local/Remote): host/host ICE candidate endpoints (Local/Remote): 192.168.10.219:51820/192.168.10.3:51820 Last connection update: 2024-02-19 14:09:48 Last Wireguard handshake: 2024-02-19 14:31:06 Transfer status (received/sent) 2.9 KiB/2.8 KiB

Daemon version: 0.25.9 CLI version: 0.25.9 Management: Connected to https://netbird.tarosu.eu.org:443 Signal: Connected to https://netbird.tarosu.eu.org:443 Relays: [stun:netbird.tarosu.eu.org:3478] is Available [turn:netbird.tarosu.eu.org:3478?transport=udp] is Available FQDN: docker219.netbird.selfhosted NetBird IP: 100.86.194.133/16 Interface type: Kernel Peers count: 4/5 Connected

Only desktop-0d03977.netbird.selfhosted can ping netbird.netbird.selfhosted together, but cannot ping other peers, also other peers cannot ping those two node.

週一 14:18 C:\Users\S2306005

ping netbird.netbird.selfhosted

Ping netbird.netbird.selfhosted [100.86.138.236] (使用 32 位元組的資料): 回覆自 100.86.138.236: 位元組=32 時間=10ms TTL=64 回覆自 100.86.138.236: 位元組=32 時間=10ms TTL=64 回覆自 100.86.138.236: 位元組=32 時間=13ms TTL=64 回覆自 100.86.138.236: 位元組=32 時間=13ms TTL=64

100.86.138.236 的 Ping 統計資料: 封包: 已傳送 = 4,已收到 = 4, 已遺失 = 0 (0% 遺失), 大約的來回時間 (毫秒): 最小值 = 10ms,最大值 = 13ms,平均 = 11ms

週一 14:36 C:\Users\S2306005

ping pve-dell.netbird.selfhosted

Ping pve-dell.netbird.selfhosted [100.86.139.76] (使用 32 位元組的資料): 要求等候逾時。 要求等候逾時。 要求等候逾時。 要求等候逾時。

100.86.139.76 的 Ping 統計資料: 封包: 已傳送 = 4,已收到 = 0, 已遺失 = 4 (100% 遺失),

tarocjsu avatar Feb 19 '24 06:02 tarocjsu

root@netbird:~# netbird status Daemon version: 0.25.9 CLI version: 0.25.9 Management: Connected Signal: Connected Relays: 2/2 Available FQDN: netbird.netbird.selfhosted NetBird IP: 100.86.138.236/16 Interface type: Kernel Peers count: 4/5 Connected

root@netbird:~# ping docker219.netbird.selfhosted PING docker219.netbird.selfhosted (100.86.194.133) 56(84) bytes of data. ^C --- docker219.netbird.selfhosted ping statistics --- 17 packets transmitted, 0 received, 100% packet loss, time 16373ms

root@netbird:~# ping desktop-0d03977.netbird.selfhosted PING desktop-0d03977.netbird.selfhosted (100.86.71.168) 56(84) bytes of data. 64 bytes from 100.86.71.168: icmp_seq=1 ttl=128 time=17.7 ms 64 bytes from 100.86.71.168: icmp_seq=2 ttl=128 time=23.2 ms 64 bytes from 100.86.71.168: icmp_seq=3 ttl=128 time=19.4 ms 64 bytes from 100.86.71.168: icmp_seq=4 ttl=128 time=24.3 ms 64 bytes from 100.86.71.168: icmp_seq=5 ttl=128 time=32.5 ms ^C --- desktop-0d03977.netbird.selfhosted ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 4006ms rtt min/avg/max/mdev = 17.671/23.421/32.476/5.136 ms

tarocjsu avatar Feb 19 '24 06:02 tarocjsu

ping hostname (FQDN) can been translate to the IP address, only use default ALL group, and Default all pass Access Control setting.

tarocjsu avatar Feb 19 '24 06:02 tarocjsu

Found root cause for my network environment, all cannot ping or been ping system network already install Tailscale daemon, after remove/uninstall the Tailscale daemon, cannot ping issue gone/solved.

tarocjsu avatar Feb 20 '24 01:02 tarocjsu