netbird icon indicating copy to clipboard operation
netbird copied to clipboard

Windows client "General Failure" after hibernation/sleep

Open timwsuqld opened this issue 2 years ago • 10 comments

Describe the problem On a windows client running Netbird, if the computer goes to sleep/hibernation for an extended period of time, no "routed" netbird VPN traffic can flow until the netbird service is restarted.

100.79.26.25 is the IP of a netbird node with network routes to hosts in the 10.2.4.0/24 network (specific)

To Reproduce Steps to reproduce the behavior:

  1. Have device working on netbird network. (Able to access a routed address through a netbird peer)
  2. Hibernate for the night
  3. After starting device back up, attempt to ping an IP accessible via the Netbird gateway
  4. See "General Failure" error
  5. Restart netbird service and everything works. (route table looks the same, status looks the same, but now the routing works)

Expected behavior Netbird should detect when the wireguard tunnel is not in the expected state and restart it automatically. Seems to be not properly routing the networks behind a peer network route.

NetBird status -d output: If applicable, add the output of the netbird status -d command

 netbird status -d
Peers detail:
 Peer:
  NetBird IP: 100.79.26.25
  Public key: cKyWMY2ajmNQlnfKFBG+XtKyw5Wob+t5rhxykbPJIiM=
  Status: Connected
  -- detail --
  Connection type: P2P
  Direct: false
  ICE candidate (Local/Remote): srflx/srflx
  Last connection update: 2022-12-19 08:42:35

Daemon version: 0.10.7
CLI version: 0.10.7
Daemon status: Connected
Management: Connected to https://netbird-vpnXXXX:33073
Signal:  Connected to http://netbird-vpnXXXX:10000
NetBird IP: 100.79.171.165/16
Interface type: Userspace
Peers count: 1/1 Connected
> route print
===========================================================================
Interface List
  3...3c 52 82 38 56 5f ......Realtek PCIe GbE Family Controller
 45...........................WireGuard Tunnel
 20...00 28 f8 53 24 7a ......Microsoft Wi-Fi Direct Virtual Adapter
  9...02 28 f8 53 24 79 ......Microsoft Wi-Fi Direct Virtual Adapter #2
 18...00 28 f8 53 24 79 ......Intel(R) Dual Band Wireless-AC 8265
 10...00 28 f8 53 24 7d ......Bluetooth Device (Personal Area Network)
  1...........................Software Loopback Interface 1
===========================================================================

IPv4 Route Table
===========================================================================
Active Routes:
Network Destination        Netmask          Gateway       Interface  Metric
          0.0.0.0          0.0.0.0      192.168.2.1     192.168.2.23     50
        10.2.4.24  255.255.255.255         On-link    100.79.171.165      6
        10.2.4.47  255.255.255.255         On-link    100.79.171.165      6
       100.79.0.0      255.255.0.0         On-link    100.79.171.165    261
   100.79.171.165  255.255.255.255         On-link    100.79.171.165    261
   100.79.255.255  255.255.255.255         On-link    100.79.171.165    261
        127.0.0.0        255.0.0.0         On-link         127.0.0.1    331
        127.0.0.1  255.255.255.255         On-link         127.0.0.1    331
  127.255.255.255  255.255.255.255         On-link         127.0.0.1    331
      192.168.2.0    255.255.255.0         On-link      192.168.2.23    306
     192.168.2.23  255.255.255.255         On-link      192.168.2.23    306
    192.168.2.255  255.255.255.255         On-link      192.168.2.23    306
        224.0.0.0        240.0.0.0         On-link         127.0.0.1    331
        224.0.0.0        240.0.0.0         On-link      192.168.2.23    306
        224.0.0.0        240.0.0.0         On-link    100.79.171.165    261
  255.255.255.255  255.255.255.255         On-link         127.0.0.1    331
  255.255.255.255  255.255.255.255         On-link      192.168.2.23    306
  255.255.255.255  255.255.255.255         On-link    100.79.171.165    261
===========================================================================
Persistent Routes:
  None

IPv6 Route Table
>  ping 100.79.26.25

Pinging 100.79.26.25 with 32 bytes of data:
Reply from 100.79.26.25: bytes=32 time=346ms TTL=64
Reply from 100.79.26.25: bytes=32 time=270ms TTL=64
Reply from 100.79.26.25: bytes=32 time=125ms TTL=64
Reply from 100.79.26.25: bytes=32 time=228ms TTL=64

Ping statistics for 100.79.26.25:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 125ms, Maximum = 346ms, Average = 242ms


> ping 10.2.4.24

Pinging 10.2.4.24 with 32 bytes of data:
General failure.
General failure.
General failure.
General failure.

Ping statistics for 10.2.4.24:
    Packets: Sent = 4, Received = 0, Lost = 4 (100% loss),

client.log

time="2022-12-19T08:40:42+08:00" level=info msg="signal client isn't ready, skipping connection attempt cKyWMY2ajmNQlnfKFBG+XtKyw5Wob+t5rhxykbPJIiM=" file="engine.go:771"
<snip about 100 lines of the same>
time="2022-12-19T08:42:32+08:00" level=info msg="signal client isn't ready, skipping connection attempt cKyWMY2ajmNQlnfKFBG+XtKyw5Wob+t5rhxykbPJIiM=" file="engine.go:771"
time="2022-12-19T08:42:33+08:00" level=info msg="connected to the Signal Service stream" file="grpc.go:136"
time="2022-12-19T08:42:35+08:00" level=info msg="connected to peer cKyWMY2ajmNQlnfKFBG+XtKyw5Wob+t5rhxykbPJIiM= [laddr <-> raddr] [61.245.136.57:60605 <-> 34.151.72.148:53237]" file="conn.go:289"
time="2022-12-19T08:42:53+08:00" level=info msg="connected to the Management Service stream" file="grpc.go:123"

client.log after restarting service

time="2022-12-19T08:58:47+08:00" level=info msg="stopped Netbird Engine" file="engine.go:203"
time="2022-12-19T08:58:47+08:00" level=info msg="stopped NetBird client" file="connect.go:166"
time="2022-12-19T08:58:48+08:00" level=info msg="stopped Netbird service" file="service_controller.go:79"
time="2022-12-19T08:58:50+08:00" level=info msg="starting Netbird service" file="service_controller.go:23"
time="2022-12-19T08:58:50+08:00" level=info msg="started daemon server: 127.0.0.1:41731" file="service_controller.go:63"
time="2022-12-19T08:58:53+08:00" level=info msg="check netforwad history is not implemented on windows" file="systemops_nonlinux.go:39"
time="2022-12-19T08:58:53+08:00" level=info msg="connected to the Signal Service stream" file="grpc.go:136"
time="2022-12-19T08:58:53+08:00" level=info msg="Netbird engine started, my IP is: 100.79.171.165/16" file="connect.go:153"
time="2022-12-19T08:58:53+08:00" level=info msg="connected to the Management Service stream" file="grpc.go:123"
time="2022-12-19T08:58:53+08:00" level=warning msg="no route was chosen for network 10.2.4.24/32 because no peers from list [cKyWMY2ajmNQlnfKFBG+XtKyw5Wob+t5rhxykbPJIiM=] were connected" file="client.go:112"
time="2022-12-19T08:58:53+08:00" level=warning msg="no route was chosen for network 10.2.4.47/32 because no peers from list [cKyWMY2ajmNQlnfKFBG+XtKyw5Wob+t5rhxykbPJIiM=] were connected" file="client.go:112"
time="2022-12-19T08:58:54+08:00" level=warning msg="no route was chosen for network 10.2.4.47/32 because no peers from list [cKyWMY2ajmNQlnfKFBG+XtKyw5Wob+t5rhxykbPJIiM=] were connected" file="client.go:112"
time="2022-12-19T08:58:54+08:00" level=warning msg="no route was chosen for network 10.2.4.24/32 because no peers from list [cKyWMY2ajmNQlnfKFBG+XtKyw5Wob+t5rhxykbPJIiM=] were connected" file="client.go:112"
time="2022-12-19T08:59:08+08:00" level=info msg="connected to peer cKyWMY2ajmNQlnfKFBG+XtKyw5Wob+t5rhxykbPJIiM= [laddr <-> raddr] [61.245.136.57:50266 <-> 34.151.72.148:53237]" file="conn.go:289"
time="2022-12-19T08:59:08+08:00" level=info msg="new chosen route is cdm52g9tkgss73f47kg0 with peer cKyWMY2ajmNQlnfKFBG+XtKyw5Wob+t5rhxykbPJIiM= with score 2" file="client.go:114"
time="2022-12-19T08:59:08+08:00" level=info msg="new chosen route is cdu2n1ptkgss739dkv7g with peer cKyWMY2ajmNQlnfKFBG+XtKyw5Wob+t5rhxykbPJIiM= with score 2" file="client.go:114"

Additional context It's only an issue with network route IPs, not direct netbird VPN IPs.

timwsuqld avatar Dec 19 '22 01:12 timwsuqld

Are there any updates on this? MacOS has the same problem: no connections after hibernation.

As standby modes are very common on laptops, Netbird on those is severely limited.

This is a big, glaring bug.

TBT-TBT avatar Mar 09 '23 18:03 TBT-TBT

Hey guys,

I was going to see if I can fix the issue but I could not reproduce it (tested current and old version). Could you check if the issue still persists for you with current version? If so I would like to get in touch so we can have a closer look to fix the issue. Thanks!

pascal-fischer avatar Jul 18 '23 10:07 pascal-fischer

This has been the same on EVERY version I have ever tested. I have uninstalled because of this, so I might not have tested the most recent versions. I would however be surprised if this had changed.

Reproduction: start the client, have it connected to a network, set the Mac (in my case) to hibernation, wait a bit, re-awake from hibernation -> other devices are not accessible anymore until disconnect/reconnect.

In my case, I do that between home and work, so maybe a network change might play a role in this, but I am pretty sure that this shouldn't be the case and this just happens with every hibernation.

ZT e.g. doen't have this problem and the peers are available after awakening the computer.

I would be happy to connect to solve this, as this problem alone has driven me away from Netbird and towards solutions which don't have this issue.

TBT-TBT avatar Jul 18 '23 17:07 TBT-TBT

I would love to connect to further investigate, I have a couple more questions. Could you reach out via slack? ([email protected]) or any other more direct channel?

pascal-fischer avatar Jul 19 '23 09:07 pascal-fischer

I'm having a similar issue with a self-hosted Netbird instance where some clients don't have any problems, but some clients (that are servers with "varying degrees" of permanent internet access) randomly "go down" and log signal client isn't ready, restarting helps for a bit. Still determining if it is related to the issue, but I'm happy to connect with you @pascal-fischer if it is okay for you, please let me know.

galexrt avatar Jul 25 '23 19:07 galexrt

I do not think that is realted but happy to have a look into what is causing your issue.

pascal-fischer avatar Jul 28 '23 15:07 pascal-fischer

I do not think that is realted but happy to have a look into what is causing your issue.

Thanks for offering to look into it!

Yes, after digging further into the code and the issue it seems to be unrelated to this specific (windows hibernation/sleep) issue. Our issue with the "signal client not ready" seems to have been solved/helped after upgrading most clients from 0.21.x to 0.22.x+. Though because of the update some of the clients that were direct: false before (because they are behind a NAT/DSL router) are now detected as direct: true which causes them to not work, though there seems to be #730 opened already by another user to track this new issue. If you want to look into that, feel free to reach out to me via my email address on my profile :-)

galexrt avatar Aug 15 '23 11:08 galexrt

I'm having a similar issue with a self-hosted Netbird instance where some clients don't have any problems, but some clients (that are servers with "varying degrees" of permanent internet access) randomly "go down" and log signal client isn't ready, restarting helps for a bit. Still determining if it is related to the issue, but I'm happy to connect with you @pascal-fischer if it is okay for you, please let me know.

Some of my windows clients have this message in client.log. Indeed the symptoms are the same, after a hibernation/sleep service is either stopped or is running but clients can't reach the server that they have access through rules.

All clients using latest version in jan 5/2024 that is 0.25.3. I've only saw this in windows. MacOS client are fine, i didnt notice any "outage" in the same circumstances

werlitong avatar Jan 05 '24 18:01 werlitong

Happened also here on a Manjaro laptop running 6.6.10-1 kernel, using netbird client 0.25.5.

What I've discovered is that I've installed on this device after I woked up from sleep.

It connected to the management and showing green among the Peers, ip route was correctly showing the routes pushed from the netbird-assigned group from the wireguard link.

But no actual routing, here some sanitized logs (replaced real routed subnet with fake 242.37.246.55/32 and real netbird server address with 240.23.23.11, 192.168.1.251 being my laptop's dhcp lease):

2024-02-01T21:41:58+01:00 WARN client/internal/routemanager/client.go:121: the network 242.37.246.55/32 has not been assigned a routing peer as no peers from the list [zMblablablaPfkvr+zkmRgblablablancNE+29rblablao=] are currently connected
2024-02-01T21:42:01+01:00 INFO client/internal/wgproxy/proxy_ebpf.go:91: turn conn added to wg proxy store: 185.142.120.1:51820, endpoint port: :28
2024-02-01T21:42:01+01:00 INFO client/internal/peer/conn.go:357: connected to peer zMblablablaPfkvr+zkmRgblablablancNE+29rblablao=, endpoint address: 127.0.0.1:28
2024-02-01T21:42:01+01:00 INFO client/internal/routemanager/client.go:124: new chosen route is cmtec0qjtdkc739266b0 with peer zMblablablaPfkvr+zkmRgblablablancNE+29rblablao= with score 0 for network 242.37.246.55/32
2024-02-01T21:42:08+01:00 WARN signal/client/grpc.go:170: disconnected from the Signal service but will retry silently. Reason: rpc error: code = Unavailable desc = error reading from server: read tcp 192.168.1.251:33538->240.23.23.11:443: read: connection timed out
2024-02-01T21:42:15+01:00 INFO client/internal/wgproxy/proxy_ebpf.go:138: stop forward turn packages to port: 28. error: EOF
2024-02-01T21:42:15+01:00 WARN client/internal/routemanager/client.go:121: the network 242.37.246.55/32 has not been assigned a routing peer as no peers from the list [zMblablablaPfkvr+zkmRgblablablancNE+29rblablao=] are currently connected
2024-02-01T21:42:16+01:00 INFO client/internal/engine.go:820: signal client isn't ready, skipping connection attempt zMblablablaPfkvr+zkmRgblablablancNE+29rblablao=
2024-02-01T21:42:17+01:00 INFO client/internal/engine.go:820: signal client isn't ready, skipping connection attempt zMblablablaPfkvr+zkmRgblablablancNE+29rblablao=
2024-02-01T21:42:19+01:00 INFO client/internal/engine.go:820: signal client isn't ready, skipping connection attempt zMblablablaPfkvr+zkmRgblablablancNE+29rblablao=
2024-02-01T21:42:19+01:00 INFO client/internal/engine.go:820: signal client isn't ready, skipping connection attempt zMblablablaPfkvr+zkmRgblablablancNE+29rblablao=
2024-02-01T21:42:21+01:00 INFO client/internal/engine.go:820: signal client isn't ready, skipping connection attempt zMblablablaPfkvr+zkmRgblablablancNE+29rblablao=
2024-02-01T21:42:22+01:00 INFO client/internal/engine.go:820: signal client isn't ready, skipping connection attempt zMblablablaPfkvr+zkmRgblablablancNE+29rblablao=

Then I've tried a "trick" I've used in the past to overcome stupid systemd behavior after sleep/hibernate: used the hardware switch to disable/enable networking.

after that netbird hooked up nicely and routes where actually placed there and usable (did not even restart the service or so)

hope it helps, feel free to ping me if you want to debug together @pascal-fischer

penzoiders avatar Feb 01 '24 21:02 penzoiders

Yea, we are trying it out and several users on MacOS are having the same issue, if they leave the office go home and try to connect they have to disconnect and reconnect. When the laptop goes into sleep/hibernate mode and they come back to it, it shows Netbird is connected but routes are not there and it does not route. Users have to disconnect and reconnect several times a day on the client.

nali215 avatar Feb 15 '24 18:02 nali215

I have similar issue on MacOS, but disconnect and reconnect works randomly and even netbird service restart is not a sure fix, or even system reboot don't fix the issue all the time.

MacronOne avatar Mar 09 '24 12:03 MacronOne

Having the same issue on self-hosted and several of our Macbooks, some more than others. Usually restarting the service helps and unblocks it after 1-2 minutes while in extreme cases a restart fixes the issue. And happens only to those that let their Macs hybername/sleep. Those that turn it off never have an issue.

vasilis-rev avatar Jun 04 '24 16:06 vasilis-rev

If that is still the case (haven't been using Netbird exactly because of that problem for years now) and nothing has changed since 2022: ouff. This is a major issue for everybody on laptops!!!!

TBT-TBT avatar Jun 04 '24 16:06 TBT-TBT