netbird
netbird copied to clipboard
Windows client "General Failure" after hibernation/sleep
Describe the problem On a windows client running Netbird, if the computer goes to sleep/hibernation for an extended period of time, no "routed" netbird VPN traffic can flow until the netbird service is restarted.
100.79.26.25 is the IP of a netbird node with network routes to hosts in the 10.2.4.0/24 network (specific)
To Reproduce Steps to reproduce the behavior:
- Have device working on netbird network. (Able to access a routed address through a netbird peer)
- Hibernate for the night
- After starting device back up, attempt to ping an IP accessible via the Netbird gateway
- See "General Failure" error
- Restart netbird service and everything works. (route table looks the same, status looks the same, but now the routing works)
Expected behavior Netbird should detect when the wireguard tunnel is not in the expected state and restart it automatically. Seems to be not properly routing the networks behind a peer network route.
NetBird status -d output:
If applicable, add the output of the netbird status -d command
netbird status -d
Peers detail:
Peer:
NetBird IP: 100.79.26.25
Public key: cKyWMY2ajmNQlnfKFBG+XtKyw5Wob+t5rhxykbPJIiM=
Status: Connected
-- detail --
Connection type: P2P
Direct: false
ICE candidate (Local/Remote): srflx/srflx
Last connection update: 2022-12-19 08:42:35
Daemon version: 0.10.7
CLI version: 0.10.7
Daemon status: Connected
Management: Connected to https://netbird-vpnXXXX:33073
Signal: Connected to http://netbird-vpnXXXX:10000
NetBird IP: 100.79.171.165/16
Interface type: Userspace
Peers count: 1/1 Connected
> route print
===========================================================================
Interface List
3...3c 52 82 38 56 5f ......Realtek PCIe GbE Family Controller
45...........................WireGuard Tunnel
20...00 28 f8 53 24 7a ......Microsoft Wi-Fi Direct Virtual Adapter
9...02 28 f8 53 24 79 ......Microsoft Wi-Fi Direct Virtual Adapter #2
18...00 28 f8 53 24 79 ......Intel(R) Dual Band Wireless-AC 8265
10...00 28 f8 53 24 7d ......Bluetooth Device (Personal Area Network)
1...........................Software Loopback Interface 1
===========================================================================
IPv4 Route Table
===========================================================================
Active Routes:
Network Destination Netmask Gateway Interface Metric
0.0.0.0 0.0.0.0 192.168.2.1 192.168.2.23 50
10.2.4.24 255.255.255.255 On-link 100.79.171.165 6
10.2.4.47 255.255.255.255 On-link 100.79.171.165 6
100.79.0.0 255.255.0.0 On-link 100.79.171.165 261
100.79.171.165 255.255.255.255 On-link 100.79.171.165 261
100.79.255.255 255.255.255.255 On-link 100.79.171.165 261
127.0.0.0 255.0.0.0 On-link 127.0.0.1 331
127.0.0.1 255.255.255.255 On-link 127.0.0.1 331
127.255.255.255 255.255.255.255 On-link 127.0.0.1 331
192.168.2.0 255.255.255.0 On-link 192.168.2.23 306
192.168.2.23 255.255.255.255 On-link 192.168.2.23 306
192.168.2.255 255.255.255.255 On-link 192.168.2.23 306
224.0.0.0 240.0.0.0 On-link 127.0.0.1 331
224.0.0.0 240.0.0.0 On-link 192.168.2.23 306
224.0.0.0 240.0.0.0 On-link 100.79.171.165 261
255.255.255.255 255.255.255.255 On-link 127.0.0.1 331
255.255.255.255 255.255.255.255 On-link 192.168.2.23 306
255.255.255.255 255.255.255.255 On-link 100.79.171.165 261
===========================================================================
Persistent Routes:
None
IPv6 Route Table
> ping 100.79.26.25
Pinging 100.79.26.25 with 32 bytes of data:
Reply from 100.79.26.25: bytes=32 time=346ms TTL=64
Reply from 100.79.26.25: bytes=32 time=270ms TTL=64
Reply from 100.79.26.25: bytes=32 time=125ms TTL=64
Reply from 100.79.26.25: bytes=32 time=228ms TTL=64
Ping statistics for 100.79.26.25:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 125ms, Maximum = 346ms, Average = 242ms
> ping 10.2.4.24
Pinging 10.2.4.24 with 32 bytes of data:
General failure.
General failure.
General failure.
General failure.
Ping statistics for 10.2.4.24:
Packets: Sent = 4, Received = 0, Lost = 4 (100% loss),
client.log
time="2022-12-19T08:40:42+08:00" level=info msg="signal client isn't ready, skipping connection attempt cKyWMY2ajmNQlnfKFBG+XtKyw5Wob+t5rhxykbPJIiM=" file="engine.go:771"
<snip about 100 lines of the same>
time="2022-12-19T08:42:32+08:00" level=info msg="signal client isn't ready, skipping connection attempt cKyWMY2ajmNQlnfKFBG+XtKyw5Wob+t5rhxykbPJIiM=" file="engine.go:771"
time="2022-12-19T08:42:33+08:00" level=info msg="connected to the Signal Service stream" file="grpc.go:136"
time="2022-12-19T08:42:35+08:00" level=info msg="connected to peer cKyWMY2ajmNQlnfKFBG+XtKyw5Wob+t5rhxykbPJIiM= [laddr <-> raddr] [61.245.136.57:60605 <-> 34.151.72.148:53237]" file="conn.go:289"
time="2022-12-19T08:42:53+08:00" level=info msg="connected to the Management Service stream" file="grpc.go:123"
client.log after restarting service
time="2022-12-19T08:58:47+08:00" level=info msg="stopped Netbird Engine" file="engine.go:203"
time="2022-12-19T08:58:47+08:00" level=info msg="stopped NetBird client" file="connect.go:166"
time="2022-12-19T08:58:48+08:00" level=info msg="stopped Netbird service" file="service_controller.go:79"
time="2022-12-19T08:58:50+08:00" level=info msg="starting Netbird service" file="service_controller.go:23"
time="2022-12-19T08:58:50+08:00" level=info msg="started daemon server: 127.0.0.1:41731" file="service_controller.go:63"
time="2022-12-19T08:58:53+08:00" level=info msg="check netforwad history is not implemented on windows" file="systemops_nonlinux.go:39"
time="2022-12-19T08:58:53+08:00" level=info msg="connected to the Signal Service stream" file="grpc.go:136"
time="2022-12-19T08:58:53+08:00" level=info msg="Netbird engine started, my IP is: 100.79.171.165/16" file="connect.go:153"
time="2022-12-19T08:58:53+08:00" level=info msg="connected to the Management Service stream" file="grpc.go:123"
time="2022-12-19T08:58:53+08:00" level=warning msg="no route was chosen for network 10.2.4.24/32 because no peers from list [cKyWMY2ajmNQlnfKFBG+XtKyw5Wob+t5rhxykbPJIiM=] were connected" file="client.go:112"
time="2022-12-19T08:58:53+08:00" level=warning msg="no route was chosen for network 10.2.4.47/32 because no peers from list [cKyWMY2ajmNQlnfKFBG+XtKyw5Wob+t5rhxykbPJIiM=] were connected" file="client.go:112"
time="2022-12-19T08:58:54+08:00" level=warning msg="no route was chosen for network 10.2.4.47/32 because no peers from list [cKyWMY2ajmNQlnfKFBG+XtKyw5Wob+t5rhxykbPJIiM=] were connected" file="client.go:112"
time="2022-12-19T08:58:54+08:00" level=warning msg="no route was chosen for network 10.2.4.24/32 because no peers from list [cKyWMY2ajmNQlnfKFBG+XtKyw5Wob+t5rhxykbPJIiM=] were connected" file="client.go:112"
time="2022-12-19T08:59:08+08:00" level=info msg="connected to peer cKyWMY2ajmNQlnfKFBG+XtKyw5Wob+t5rhxykbPJIiM= [laddr <-> raddr] [61.245.136.57:50266 <-> 34.151.72.148:53237]" file="conn.go:289"
time="2022-12-19T08:59:08+08:00" level=info msg="new chosen route is cdm52g9tkgss73f47kg0 with peer cKyWMY2ajmNQlnfKFBG+XtKyw5Wob+t5rhxykbPJIiM= with score 2" file="client.go:114"
time="2022-12-19T08:59:08+08:00" level=info msg="new chosen route is cdu2n1ptkgss739dkv7g with peer cKyWMY2ajmNQlnfKFBG+XtKyw5Wob+t5rhxykbPJIiM= with score 2" file="client.go:114"
Additional context It's only an issue with network route IPs, not direct netbird VPN IPs.
Are there any updates on this? MacOS has the same problem: no connections after hibernation.
As standby modes are very common on laptops, Netbird on those is severely limited.
This is a big, glaring bug.
Hey guys,
I was going to see if I can fix the issue but I could not reproduce it (tested current and old version). Could you check if the issue still persists for you with current version? If so I would like to get in touch so we can have a closer look to fix the issue. Thanks!
This has been the same on EVERY version I have ever tested. I have uninstalled because of this, so I might not have tested the most recent versions. I would however be surprised if this had changed.
Reproduction: start the client, have it connected to a network, set the Mac (in my case) to hibernation, wait a bit, re-awake from hibernation -> other devices are not accessible anymore until disconnect/reconnect.
In my case, I do that between home and work, so maybe a network change might play a role in this, but I am pretty sure that this shouldn't be the case and this just happens with every hibernation.
ZT e.g. doen't have this problem and the peers are available after awakening the computer.
I would be happy to connect to solve this, as this problem alone has driven me away from Netbird and towards solutions which don't have this issue.
I would love to connect to further investigate, I have a couple more questions. Could you reach out via slack? ([email protected]) or any other more direct channel?
I'm having a similar issue with a self-hosted Netbird instance where some clients don't have any problems, but some clients (that are servers with "varying degrees" of permanent internet access) randomly "go down" and log signal client isn't ready, restarting helps for a bit. Still determining if it is related to the issue, but I'm happy to connect with you @pascal-fischer if it is okay for you, please let me know.
I do not think that is realted but happy to have a look into what is causing your issue.
I do not think that is realted but happy to have a look into what is causing your issue.
Thanks for offering to look into it!
Yes, after digging further into the code and the issue it seems to be unrelated to this specific (windows hibernation/sleep) issue. Our issue with the "signal client not ready" seems to have been solved/helped after upgrading most clients from 0.21.x to 0.22.x+.
Though because of the update some of the clients that were direct: false before (because they are behind a NAT/DSL router) are now detected as direct: true which causes them to not work, though there seems to be #730 opened already by another user to track this new issue. If you want to look into that, feel free to reach out to me via my email address on my profile :-)
I'm having a similar issue with a self-hosted Netbird instance where some clients don't have any problems, but some clients (that are servers with "varying degrees" of permanent internet access) randomly "go down" and log
signal client isn't ready, restarting helps for a bit. Still determining if it is related to the issue, but I'm happy to connect with you @pascal-fischer if it is okay for you, please let me know.
Some of my windows clients have this message in client.log. Indeed the symptoms are the same, after a hibernation/sleep service is either stopped or is running but clients can't reach the server that they have access through rules.
All clients using latest version in jan 5/2024 that is 0.25.3. I've only saw this in windows. MacOS client are fine, i didnt notice any "outage" in the same circumstances
Happened also here on a Manjaro laptop running 6.6.10-1 kernel, using netbird client 0.25.5.
What I've discovered is that I've installed on this device after I woked up from sleep.
It connected to the management and showing green among the Peers, ip route was correctly showing the routes pushed from the netbird-assigned group from the wireguard link.
But no actual routing, here some sanitized logs (replaced real routed subnet with fake 242.37.246.55/32 and real netbird server address with 240.23.23.11, 192.168.1.251 being my laptop's dhcp lease):
2024-02-01T21:41:58+01:00 WARN client/internal/routemanager/client.go:121: the network 242.37.246.55/32 has not been assigned a routing peer as no peers from the list [zMblablablaPfkvr+zkmRgblablablancNE+29rblablao=] are currently connected
2024-02-01T21:42:01+01:00 INFO client/internal/wgproxy/proxy_ebpf.go:91: turn conn added to wg proxy store: 185.142.120.1:51820, endpoint port: :28
2024-02-01T21:42:01+01:00 INFO client/internal/peer/conn.go:357: connected to peer zMblablablaPfkvr+zkmRgblablablancNE+29rblablao=, endpoint address: 127.0.0.1:28
2024-02-01T21:42:01+01:00 INFO client/internal/routemanager/client.go:124: new chosen route is cmtec0qjtdkc739266b0 with peer zMblablablaPfkvr+zkmRgblablablancNE+29rblablao= with score 0 for network 242.37.246.55/32
2024-02-01T21:42:08+01:00 WARN signal/client/grpc.go:170: disconnected from the Signal service but will retry silently. Reason: rpc error: code = Unavailable desc = error reading from server: read tcp 192.168.1.251:33538->240.23.23.11:443: read: connection timed out
2024-02-01T21:42:15+01:00 INFO client/internal/wgproxy/proxy_ebpf.go:138: stop forward turn packages to port: 28. error: EOF
2024-02-01T21:42:15+01:00 WARN client/internal/routemanager/client.go:121: the network 242.37.246.55/32 has not been assigned a routing peer as no peers from the list [zMblablablaPfkvr+zkmRgblablablancNE+29rblablao=] are currently connected
2024-02-01T21:42:16+01:00 INFO client/internal/engine.go:820: signal client isn't ready, skipping connection attempt zMblablablaPfkvr+zkmRgblablablancNE+29rblablao=
2024-02-01T21:42:17+01:00 INFO client/internal/engine.go:820: signal client isn't ready, skipping connection attempt zMblablablaPfkvr+zkmRgblablablancNE+29rblablao=
2024-02-01T21:42:19+01:00 INFO client/internal/engine.go:820: signal client isn't ready, skipping connection attempt zMblablablaPfkvr+zkmRgblablablancNE+29rblablao=
2024-02-01T21:42:19+01:00 INFO client/internal/engine.go:820: signal client isn't ready, skipping connection attempt zMblablablaPfkvr+zkmRgblablablancNE+29rblablao=
2024-02-01T21:42:21+01:00 INFO client/internal/engine.go:820: signal client isn't ready, skipping connection attempt zMblablablaPfkvr+zkmRgblablablancNE+29rblablao=
2024-02-01T21:42:22+01:00 INFO client/internal/engine.go:820: signal client isn't ready, skipping connection attempt zMblablablaPfkvr+zkmRgblablablancNE+29rblablao=
Then I've tried a "trick" I've used in the past to overcome stupid systemd behavior after sleep/hibernate: used the hardware switch to disable/enable networking.
after that netbird hooked up nicely and routes where actually placed there and usable (did not even restart the service or so)
hope it helps, feel free to ping me if you want to debug together @pascal-fischer
Yea, we are trying it out and several users on MacOS are having the same issue, if they leave the office go home and try to connect they have to disconnect and reconnect. When the laptop goes into sleep/hibernate mode and they come back to it, it shows Netbird is connected but routes are not there and it does not route. Users have to disconnect and reconnect several times a day on the client.
I have similar issue on MacOS, but disconnect and reconnect works randomly and even netbird service restart is not a sure fix, or even system reboot don't fix the issue all the time.
Having the same issue on self-hosted and several of our Macbooks, some more than others. Usually restarting the service helps and unblocks it after 1-2 minutes while in extreme cases a restart fixes the issue. And happens only to those that let their Macs hybername/sleep. Those that turn it off never have an issue.
If that is still the case (haven't been using Netbird exactly because of that problem for years now) and nothing has changed since 2022: ouff. This is a major issue for everybody on laptops!!!!