in the last weeks netbird randomly lost connection and not able to recover
since v0.36.5 no longer be able to connect other peers. sometimes netbird restart solve the problem sometimes not. netbird status -d show connected but not even a ping works with the peers 100.76.x.x ip address.ps axuf here is a part from the log:
2025-02-14T19:48:03+01:00 INFO [relay: rels://streamline-de-fra1-1.relay.netbird.io:443] relay/client/client.go:214: open connection to peer: sha-Dn9xgXi3A/4FEe90jhVUP/dkvMcxA59y/e7x0g3oZO4=
2025-02-14T19:48:03+01:00 INFO client/iface/wgproxy/ebpf/proxy.go:102: turn conn added to wg proxy store: rels://streamline-de-fra1-1.relay.netbird.io:443, endpoint port: :3
2025-02-14T19:48:03+01:00 INFO [peer: +i/q6dNa3AeF/iNJMH9+CbnsTLmFPfN+/K0KUPJI5wI=] client/internal/peer/conn.go:447: created new wgProxy for relay connection: 127.0.0.1:3
2025-02-14T19:48:03+01:00 INFO [peer: +i/q6dNa3AeF/iNJMH9+CbnsTLmFPfN+/K0KUPJI5wI=] client/internal/peer/wg_watcher.go:87: WireGuard watcher started
2025-02-14T19:48:03+01:00 INFO [peer: f+tmDAAoOYRUT/WAoJl0PsqalR4zJvt7ljkxZboO9iE=] client/internal/peer/conn.go:476: start to communicate with peer via relay
2025-02-14T19:48:03+01:00 INFO [relay: rels://streamline-de-fra1-0.relay.netbird.io:443] relay/client/client.go:164: create new relay connection: local peerID: gsrpCbJwc8lkmNV783rxIHpyj+zZIhy/rFj5HsfVuBY=, local peer hashedID: sha-99JRJjv0
PJBbfBPJzmU0KgWX+n3VVc6ezC48fcixQBE=
2025-02-14T19:48:03+01:00 INFO [relay: rels://streamline-de-fra1-0.relay.netbird.io:443] relay/client/client.go:170: connecting to relay server
2025-02-14T19:48:03+01:00 INFO [relay: rels://streamline-de-fra1-0.relay.netbird.io:443] relay/client/dialer/race_dialer.go:64: dialing Relay server via quic
2025-02-14T19:48:03+01:00 INFO [relay: rels://streamline-de-fra1-0.relay.netbird.io:443] relay/client/dialer/race_dialer.go:64: dialing Relay server via WS
2025-02-14T19:48:03+01:00 INFO [peer: FfiyZKMquYILabBxOquw/jXEuTjhBq6tUvBEPdV3ckY=] client/internal/peer/conn.go:476: start to communicate with peer via relay
2025-02-14T19:48:03+01:00 INFO client/internal/routemanager/client.go:210: New chosen route is co1co8bl0ubs739dfm90 with peer FfiyZKMquYILabBxOquw/jXEuTjhBq6tUvBEPdV3ckY= with score 19990.001000 for network [192.168.0.0/16]
2025-02-14T19:48:03+01:00 INFO [peer: +i/q6dNa3AeF/iNJMH9+CbnsTLmFPfN+/K0KUPJI5wI=] client/internal/peer/conn.go:476: start to communicate with peer via relay
2025-02-14T19:48:03+01:00 INFO [relay: rels://streamline-de-fra1-0.relay.netbird.io:443] relay/client/dialer/race_dialer.go:89: successfully dialed via: WS
2025-02-14T19:48:03+01:00 INFO [relay: rels://streamline-de-fra1-0.relay.netbird.io:443] relay/client/dialer/race_dialer.go:75: connection attempt aborted via: quic
2025-02-14T19:48:03+01:00 INFO [relay: rels://streamline-de-fra1-0.relay.netbird.io:443] relay/client/client.go:186: relay connection established
2025-02-14T19:48:03+01:00 INFO [relay: rels://streamline-de-fra1-0.relay.netbird.io:443] relay/client/client.go:214: open connection to peer: sha-d6bmxNpKji4X2AM4Syi/oXpY9FJ6J27RG3gTY9ONhdE=
2025-02-14T19:48:03+01:00 INFO client/iface/wgproxy/ebpf/proxy.go:102: turn conn added to wg proxy store: rels://streamline-de-fra1-0.relay.netbird.io:443, endpoint port: :4
2025-02-14T19:48:03+01:00 INFO [peer: hCDjKQBW9TBwsZigTRXxvVzpAYE+ZqDHBol4sOSUMl0=] client/internal/peer/conn.go:447: created new wgProxy for relay connection: 127.0.0.1:4
2025-02-14T19:48:03+01:00 INFO [relay: rels://streamline-de-fra1-1.relay.netbird.io:443] relay/client/client.go:214: open connection to peer: sha-Asv8+qhh3HsYQgPXy3cIzGzTjlTvEIoTND3nPoVZDgw=
2025-02-14T19:48:03+01:00 INFO client/iface/wgproxy/ebpf/proxy.go:102: turn conn added to wg proxy store: rels://streamline-de-fra1-1.relay.netbird.io:443, endpoint port: :5
2025-02-14T19:48:03+01:00 INFO [peer: RtObgAe/KslyFa/t0a/iGwy7HohRzO8xhNNUPIR1ri8=] client/internal/peer/conn.go:447: created new wgProxy for relay connection: 127.0.0.1:5
2025-02-14T19:48:03+01:00 INFO [peer: hCDjKQBW9TBwsZigTRXxvVzpAYE+ZqDHBol4sOSUMl0=] client/internal/peer/wg_watcher.go:87: WireGuard watcher started
2025-02-14T19:48:03+01:00 INFO [peer: RtObgAe/KslyFa/t0a/iGwy7HohRzO8xhNNUPIR1ri8=] client/internal/peer/wg_watcher.go:87: WireGuard watcher started
2025-02-14T19:48:03+01:00 INFO [relay: rels://streamline-de-fra1-1.relay.netbird.io:443] relay/client/client.go:214: open connection to peer: sha-ULsX413ckuLILuPUeQ8liU9B86RCBgkvFP0SdhMWbUw=
2025-02-14T19:48:03+01:00 INFO client/iface/wgproxy/ebpf/proxy.go:102: turn conn added to wg proxy store: rels://streamline-de-fra1-1.relay.netbird.io:443, endpoint port: :6
2025-02-14T19:48:03+01:00 INFO [peer: 1u25Mrocd2aMv88fUgRnKmM1caynzX+bGTzThCZ3CnE=] client/internal/peer/conn.go:447: created new wgProxy for relay connection: 127.0.0.1:6
2025-02-14T19:48:03+01:00 INFO [peer: 1u25Mrocd2aMv88fUgRnKmM1caynzX+bGTzThCZ3CnE=] client/internal/peer/wg_watcher.go:87: WireGuard watcher started
2025-02-14T19:48:03+01:00 INFO [peer: RtObgAe/KslyFa/t0a/iGwy7HohRzO8xhNNUPIR1ri8=] client/internal/peer/conn.go:476: start to communicate with peer via relay
2025-02-14T19:48:03+01:00 INFO client/internal/routemanager/client.go:210: New chosen route is co1dqj3l0ubs739dfnsg with peer hCDjKQBW9TBwsZigTRXxvVzpAYE+ZqDHBol4sOSUMl0= with score 49990.001000 for network [192.168.0.0/16]
2025-02-14T19:48:03+01:00 INFO [peer: hCDjKQBW9TBwsZigTRXxvVzpAYE+ZqDHBol4sOSUMl0=] client/internal/peer/conn.go:476: start to communicate with peer via relay
2025-02-14T19:48:03+01:00 INFO [peer: 1u25Mrocd2aMv88fUgRnKmM1caynzX+bGTzThCZ3CnE=] client/internal/peer/conn.go:476: start to communicate with peer via relay
2025-02-14T19:48:03+01:00 INFO client/internal/routemanager/client.go:210: New chosen route is co1kv3bl0ubs739dg130 with peer 1u25Mrocd2aMv88fUgRnKmM1caynzX+bGTzThCZ3CnE= with score 0.001000 for network [10.20.0.0/24]
2025-02-14T19:48:03+01:00 INFO client/internal/routemanager/client.go:210: New chosen route is co1kuj3l0ubs739dg11g with peer 1u25Mrocd2aMv88fUgRnKmM1caynzX+bGTzThCZ3CnE= with score 0.001000 for network [10.30.0.0/24]
2025-02-14T19:48:03+01:00 INFO [peer: +i/q6dNa3AeF/iNJMH9+CbnsTLmFPfN+/K0KUPJI5wI=] client/internal/peer/conn.go:328: set ICE to active connection
2025-02-14T19:48:03+01:00 INFO [peer: +i/q6dNa3AeF/iNJMH9+CbnsTLmFPfN+/K0KUPJI5wI=] client/internal/peer/wg_watcher.go:111: WireGuard watcher stopped
2025-02-14T19:48:03+01:00 INFO [peer: f+tmDAAoOYRUT/WAoJl0PsqalR4zJvt7ljkxZboO9iE=] client/internal/peer/conn.go:328: set ICE to active connection
2025-02-14T19:48:03+01:00 INFO [peer: f+tmDAAoOYRUT/WAoJl0PsqalR4zJvt7ljkxZboO9iE=] client/internal/peer/wg_watcher.go:111: WireGuard watcher stopped
2025-02-14T19:48:03+01:00 INFO [peer: FfiyZKMquYILabBxOquw/jXEuTjhBq6tUvBEPdV3ckY=] client/internal/peer/conn.go:328: set ICE to active connection
2025-02-14T19:48:03+01:00 INFO [peer: FfiyZKMquYILabBxOquw/jXEuTjhBq6tUvBEPdV3ckY=] client/internal/peer/wg_watcher.go:111: WireGuard watcher stopped
2025-02-14T19:48:04+01:00 INFO [peer: RtObgAe/KslyFa/t0a/iGwy7HohRzO8xhNNUPIR1ri8=] client/internal/peer/conn.go:328: set ICE to active connection
2025-02-14T19:48:04+01:00 INFO [peer: RtObgAe/KslyFa/t0a/iGwy7HohRzO8xhNNUPIR1ri8=] client/internal/peer/wg_watcher.go:111: WireGuard watcher stopped
2025-02-14T19:48:04+01:00 INFO [peer: 1u25Mrocd2aMv88fUgRnKmM1caynzX+bGTzThCZ3CnE=] client/internal/peer/conn.go:328: set ICE to active connection
2025-02-14T19:48:04+01:00 INFO [peer: 1u25Mrocd2aMv88fUgRnKmM1caynzX+bGTzThCZ3CnE=] client/internal/peer/wg_watcher.go:111: WireGuard watcher stopped
2025-02-14T19:48:05+01:00 INFO [peer: hCDjKQBW9TBwsZigTRXxvVzpAYE+ZqDHBol4sOSUMl0=] client/internal/peer/conn.go:328: set ICE to active connection
2025-02-14T19:48:05+01:00 INFO [peer: hCDjKQBW9TBwsZigTRXxvVzpAYE+ZqDHBol4sOSUMl0=] client/internal/peer/wg_watcher.go:111: WireGuard watcher stopped
2025-02-14T19:48:05+01:00 INFO [peer: f+tmDAAoOYRUT/WAoJl0PsqalR4zJvt7ljkxZboO9iE=] client/internal/peer/guard/guard.go:84: start reconnect loop...
2025-02-14T19:48:05+01:00 INFO [peer: +i/q6dNa3AeF/iNJMH9+CbnsTLmFPfN+/K0KUPJI5wI=] client/internal/peer/guard/guard.go:84: start reconnect loop...
2025-02-14T19:48:05+01:00 INFO [peer: FfiyZKMquYILabBxOquw/jXEuTjhBq6tUvBEPdV3ckY=] client/internal/peer/guard/guard.go:84: start reconnect loop...
2025-02-14T19:48:06+01:00 INFO [peer: Yg/JDeFsAfMnue9KOTNm77L0AlG1g3Y6pYIm3KhUxyw=] client/internal/peer/guard/guard.go:84: start reconnect loop...
2025-02-14T19:48:06+01:00 INFO [peer: 1u25Mrocd2aMv88fUgRnKmM1caynzX+bGTzThCZ3CnE=] client/internal/peer/guard/guard.go:84: start reconnect loop...
2025-02-14T19:48:06+01:00 INFO [peer: RtObgAe/KslyFa/t0a/iGwy7HohRzO8xhNNUPIR1ri8=] client/internal/peer/guard/guard.go:84: start reconnect loop...
2025-02-14T19:48:06+01:00 INFO [peer: Kc8hGcw4uOpvTwgvTste9cdhtPpmMLsZDeOYSITNGnk=] client/internal/peer/guard/guard.go:84: start reconnect loop...
2025-02-14T19:53:02+01:00 INFO client/internal/peer/guard/sr_watcher.go:94: network changes detected by ICE agent
peer is connected but can't be ping:
# ping fox
PING fox.netbird.cloud (100.76.171.201) 56(84) bytes of data.
^C
--- fox.netbird.cloud ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3109ms
from netbird status -d:
fox.netbird.cloud:
NetBird IP: 100.76.171.201
Public key: FfiyZKMquYILabBxOquw/jXEuTjhBq6tUvBEPdV3ckY=
Status: Connected
-- detail --
Connection type: P2P
ICE candidate (Local/Remote): host/srflx
ICE candidate endpoints (Local/Remote): 10.5.5.217:51820/185.199.30.141:14255
Relay server address: rels://streamline-de-fra1-2.relay.netbird.io:443
Last connection update: 18 minutes, 19 seconds ago
Last WireGuard handshake: -
Transfer status (received/sent) 12.1 KiB/20.3 KiB
Quantum resistance: true
Routes: -
Networks: -
Latency: 8.003627ms
OS: linux/amd64
Daemon version: 0.36.7
CLI version: 0.36.7
Management: Connected to https://api.netbird.io:443
Signal: Connected to https://signal.netbird.io:443
Relays:
[stun:stun.netbird.io:5555] is Available
[turns:turn.netbird.io:443?transport=tcp] is Available
[rels://streamline-de-fra1-2.relay.netbird.io:443] is Available
Nameservers:
[192.168.208.1:53] for [int.vidux.hu] is Unavailable, reason: 1 error occurred:
* read udp 10.5.5.217:50996->192.168.208.1:53: i/o timeout
[10.30.0.1:53] for [szeged.vidux.hu] is Available
FQDN: dell.netbird.cloud
NetBird IP: 100.76.111.32/16
Interface type: Kernel
Quantum resistance: true (permissive)
Routes: -
Networks: -
Peers count: 5/8 Connected
@lfarkas, can you please run the following command while repeating the ping test?
netbird debug for 5m -S
Then please share the generated bundle file?
To be honest it is a serious problem for us. In the last few month it happened regularly not to be able access to the work network from home and someone must restart the NetBird service in the internal network... Sometimes even in this case the connection is not working.
Hey just wanted to chime in that I'm having the same issue when deploying via Kubernetes, really keen on a fix for this.
I'm experiencing the same issue. I deployed a self-hosted instance using the Helm chart from totmicro/helms. There might be a problem with the relay server configuration, as my peers seem to disconnect after a while (they appear to lose connection with the relay server):
Relays:
[rels://vpn.my.domain:443/relay] is Unavailable, reason: relay connection is not established
On relay server I got a lot of following errors
ERRO relay/server/relay.go:121: failed to handshake: validate sha-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx (x.x.x.x:yyyy): expired token
we don't use a self hosted version but the cloud version. and it's still happen and very annoying. since when i can't access to the remote site there is no other way to restart netbird just go to the site and reboot the machine or restart netbird service.
is there any progress with it?
we don't use a self hosted version but the cloud version. and it's still happen and very annoying. since when i can't access to the remote site there is no other way to restart netbird just go to the site and reboot the machine or restart netbird service.
is there any progress with it?
Try this one; disable and enable policy. Please share the result here
what policy and where and how ? anyway i send you the whole debug log 3 weeks ago above. did you look into that?
i've got 7 connected peer and from it i can ping 5 and can't 2. there is only one policy in https://app.netbird.io/access-control the default. if i disable and enable it still 2 can't ping but not the same 2:-)
and today i updated all client to 0.38.0
after play a bit with policy disable/enable sometimes able to access this critical peer (which is always online in the peer list) for a few minutes or second, but this is never longer the 5 minutes and after then no longer works. after a new disable enable it's works again for a few minutes but with this click i disconnect my whole netbird network...
@lfarkas, we will prepare a debugging version for you to try tomorrow, as it seems like the fixes from recent versions are not helping your case.
x86_64 rpm please
@lfarkas, you can download the packages from the link:
https://github.com/netbirdio/netbird/actions/runs/13881767994/artifacts/2759669965
this file will have builder artifacts for the PR: https://github.com/netbirdio/netbird/pull/3517. You will find the rpm installer there, too.
In case of an issue, please make sure that the agent is running for at least 10 minutes, then generate a bundle with logs for analyzis with the command:
netbird debug bundle -S
Also, please share which peers the node can't connect to.
So I've to install it into one client and not all? And the other client can be the normal 0.38 version?
If you can install on all affected clients, that will increase our chances of getting helpful logs
@lfarkas, the last build had the potential to cause a panic. You can use this one instead: https://github.com/netbirdio/netbird/actions/runs/13884417374/artifacts/2760236240
these are both contains the asame commit id: netbird_0.38.1-SNAPSHOT-9c4fdec9_linux_amd64.rpm anyway before i install it i can't ping 100.76.121.209 (which status is connected) after i install it ping start to work after about 5 minutes it's no longer works ie ping no longer works.syste after this i stop the normal systemd service and while i looking into which command to start ping in the other window start to working and turn out something start netbird service!? i stopped again with: systemctl stop netbird.service and about a minutes later ping works again and netbird runs again!? why i can't stop it? after a systemctl disable --now netbird.service still start itself in about a minutes. is there any why how can i stop it??? anyway if i'm fast enough: root@wolf:~# systemctl stop netbird.service ;netbird debug bundle -S Job for netbird.service canceled. /tmp/netbird.debug.1526428740.zip i hope i can run in test mode. my local netbird ip is: 100.76.24.179
the remote client's: NetBird IP: 100.76.121.209 Public key: f+tmDAAoOYRUT/WAoJl0PsqalR4zJvt7ljkxZboO9iE=
and of course when i start it in this mode ping is working, but after 183 packet it's no longer works again, here is the debug output (and i only install this rpm only my local client. if you need it on the remote client too let me know.
but i don;t know it's a valid output or not since this command return immediately:
root@wolf:~# systemctl stop netbird.service ;netbird debug bundle -S
Job for netbird.service canceled.
/tmp/netbird.debug.1526428740.zip
@lfarkas, sorry, I didn't get why you tried to stop the agent. The agent should be running and failing when getting the bungle.
ok but the agent is ALWAYS running since it's not possible to turn it off. imho it's a problem.
here is another dump (when the ping is not working and i'm sure if i restart the service it's working again for a few minutes): netbird.debug.3319656448.zip
is there anything what can i do?
@lfarkas Szia! Can we schedule a call to go through some details?
@lfarkas can you confirm if the issue persist with the latest version and rosenpass?
to be honest i'm not really like to test it. at least not before easter. if i reconfigure my vpn setting to rosen and then still not working i'll no longer be able to access to my office network (which happened before) and there is no way to recover from this state... may be after easter...
Hi there,
i am having exactly the same issue. After a Re-Install it should work for a couple if mins and afterwards its stops working. After enable/disbale the Policy i get this issues:
client/internal/peer/handshaker.go:79: wait for remote offer confirmation on both servers.
I am running the current Version 0.43.1 on an Debian
Hi,
please forget what i said.
Chain fail2ban-SIP (1 references)
target prot opt source destination
REJECT all -- 100.114.165.225 anywhere reject-with icmp-port-unreachable
REJECT all -- 100.114.188.68 anywhere reject-with icmp-port-unreachable
Hi all, I just stumbled across this issue and wondered if I would be able to help out future people as I also had similar issues. I wrote about this in my comment on #3852.
We were having random disconnects after we enabled Rosenpass, so to test this theory I disabled Rosenpass across all peers and set a pre-shared key instead. The random disconnects completely stopped after this.
Therefore I would suggest disabling quantum resistance in the hope that doing so will enable your peers to remain connected.