netbird icon indicating copy to clipboard operation
netbird copied to clipboard

Netbird not waking up after sleep

Open coffeediver opened this issue 9 months ago • 12 comments

After my computer goes to sleep for inactiviy or I close the lid (set to sleep), Netbird will not reach my remote server. Ping and tracert show nothing Microsoft Windows [Version 10.0.26100.4061] (c) Microsoft Corporation. All rights reserved.

C:\Users\coffe>ping 192.168.4.116

Pinging 192.168.4.116 with 32 bytes of data: Request timed out. Request timed out. Request timed out. Request timed out.

Ping statistics for 192.168.4.116: Packets: Sent = 4, Received = 0, Lost = 4 (100% loss),

C:\Users\coffe>tracert 192.168.4.116

Tracing route to 192.168.4.116 over a maximum of 30 hops

1 * * * Request timed out. 2 * * * Request timed out. 3 * * * Request timed out. 4 * * * Request timed out. 5 * * * Request timed out. 6 ^C

I can ping the netbird assigned ip address. I have to disconnect netbird and reconnect for it to start working again. Also the Netbird peers dashboard will show everything connected.

coffeediver avatar May 26 '25 14:05 coffeediver

@coffeediver, can you confirm if the issue is reproducible? Also, do you need to authenticate again when you reconnect?

mlsmaycon avatar May 26 '25 15:05 mlsmaycon

Yes, it happens whenever my laptop goes to sleep. I timed it at exactly 1 minute and had to disconnect and reconnect Netbird but do not have to authenticate. I also tried it for 10 seconds and the problem did not happen. I do not know exactly what the time limit is but is repeatable. The bird icon in my tray shows the connect check mark but will not connect or ping/tracert to my remote network but I can ping the ip address that Netbird assigns. On my peers dashboard it shows everything connected. At first I was just rebooting my laptop until I figured out that just right click on the Netbird icon in the system tray, disconnect and then reconnect works everytime.

coffeediver avatar May 26 '25 15:05 coffeediver

@coffeediver I've attempted to reproduce the issue locally without success.

If possible, can you please run the following steps?

  1. enable debug logs by running the following command:
netbird debug log level trace
  1. make the laptop enter sleep mode, as you mentioned before
  2. wake the computer again
  3. confirm that it is unable to restore connectivity
  4. run the following commands: netbird down netbird up netbird debug bundle -AS -U
  5. disable debug logs by running the following command:
netbird debug log level info

This will generate a bundle for us to analyze your case. Please share the upload key here.

mlsmaycon avatar May 26 '25 15:05 mlsmaycon

it's happened with usnetbird-0.45.1-1.x86_64

lfarkas avatar May 27 '25 06:05 lfarkas

Same here. The difference is that I change my location from work to home or vice versa. Laptop is in sleep mode between. I have netbird hosts at both locations with subnet routes configured. Fedora 42 with netbird-0.45.1-1.x86_64 It was working fine before this version.

❯ netbird debug log level trace
Log level set successfully to trace
❯ ping 192.168.208.1
PING 192.168.208.1 (192.168.208.1) 56(84) bytes of data.
From 100.92.244.51 icmp_seq=1 Destination Host Unreachable
ping: sendmsg: Required key not available
From 100.92.244.51 icmp_seq=2 Destination Host Unreachable
ping: sendmsg: Required key not available
From 100.92.244.51 icmp_seq=3 Destination Host Unreachable
ping: sendmsg: Required key not available
^C
--- 192.168.208.1 ping statistics ---
3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2062ms
❯ netbird down
Disconnected
❯ netbird up
Connected
❯ ping 192.168.208.1
PING 192.168.208.1 (192.168.208.1) 56(84) bytes of data.
64 bytes from 192.168.208.1: icmp_seq=1 ttl=63 time=43.8 ms
64 bytes from 192.168.208.1: icmp_seq=2 ttl=63 time=44.7 ms
^C
--- 192.168.208.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 43.838/44.288/44.739/0.450 ms
❯ netbird debug bundle -AS -U
Local file:
/tmp/netbird.debug.3526160544.zip
Upload file key:
f79e391890ab27fb37c88b3b4be7011e22aa2e5ca6f38ffa9c4481884941f726/c70b5720-7d98-4cf6-9da4-10f51c1a8ccd

netbird.debug.3526160544.zip

gcsontos avatar May 27 '25 14:05 gcsontos

I had the same problem with my Windows 11 laptop, but i can't reproduce it every time. It happens just sometimes.

Nordlicht-13 avatar May 27 '25 21:05 Nordlicht-13

The same on macos

tropnikovvl avatar May 28 '25 07:05 tropnikovvl

We also have this problem on all our clients

patrick-brigel avatar May 30 '25 09:05 patrick-brigel

This already happens, when the display gets black for a couple of minutes and you come back. Then I have to disconnect netbird and connect again to access the smb-shares on my TrueNAS.

Nordlicht-13 avatar Jun 01 '25 13:06 Nordlicht-13

We have the same Problem for at least some of our Windows Clients. It started around 1 1/2 weeks ago according to the users. They said "For about 1 1/2 weeks now, 1-2 times a day, if I have been inactive for a longer period of time or after a certain time, I have the problem that I get a “Page cannot be accessed” message with SYSTEM1 and SYSTEM2. When I reconnect, it works again immediately"

WortmannImpleco avatar Jun 03 '25 07:06 WortmannImpleco

Nearly every new Windows Client we add now has this issue. Only started around 10 days ago. The Client on the System sometimes shows that its still connected and I can also the a connection in the manager, but the client can not open any of the pages with configured routes. Every website thats not configured via a route works.

WortmannImpleco avatar Jun 04 '25 08:06 WortmannImpleco

I definitely see wake from sleep issues on macOS and it happens every time

OS: darwin/arm64
Daemon version: 0.50.3.1500
CLI version: 0.50.3.1500
Management: Connected to https://vpn.anon-0iwam.domain:443
Signal: Disconnected, reason: rpc error: code = Unavailable desc = error reading from server: read tcp 172.24.225.13:51649->xxx.51.100.0:443: read: operation timed out
Relays:
  [stun:stun.anon-5QWiC.domain:3478] is Available
  [stun:stun.l.anon-Esx22.domain:19302] is Available
  [rels://vpn-mgmt-prd-west.anon-0iwam.domain:443/relay] is Unavailable, reason: relay connection is not established
Nameservers:
  [10.198.1.7:53, 10.197.1.7:53, 10.197.3.7:53] for [corp.anon-0iwam.domain, dev.anon-0iwam.domain, privatelink.anon-0iwam.domain, anon-0iwam.domain] is Unavailable, reason: with udp: read udp 100.122.60.129:0->10.197.3.7:53: i/o timeout
FQDN: lh9kcr6djx.vpn-mgmt-prd-west.anon-0iwam.domain
NetBird IP: 100.122.60.129/16
Interface type: Userspace
Quantum resistance: false
Lazy connection: false
Networks: -
Forwarding rules: 0
Peers count: 0/3 Connected

hurricanehrndz avatar Jul 12 '25 16:07 hurricanehrndz

Signal connection is usually lost, management not always. In the log I see issues with trying to resolve the DNS for relay/management and/or signal endpoint.

So I decided to dig deeper...

Let's say relay,signal, and management are all at nebird.xyz.com, additionally, xyz.com is set a search domain via netbird management. Once I connect, the macos subsystem fails to resolve netbird.xyz.com, dig to the upstream resolvers work, but dig to netbird's .254 DNS server fails. macOS' dns subsystem also fails to resolve:

So let's say my netbird DNS resolver is 100.122.255.254, my upstream resolvers are 10.196.7.1, and 10.196.3.1, these are the results:

dig @10.196.7.1 netbird.xyz.com  # works
dig @10.196.3.1 netbird.xyz.com  # works
dig @100.122.255.254 netbird.xyz.com # failse

and macOS' resolution fails to return a record:

❯ dns-sd -t 1 -Q netbird.xyz.com
DATE: ---Sat 12 Jul 2025---
14:38:11.811  ...STARTING...
Timestamp     A/R  Flags         IF  Name                          Type   Class  Rdata
14:38:11.820  Add  2              0  netbird.xyz.com. Addr   IN     0.0.0.0    No Such Record

This definitely explains the wake on sleep issues

hurricanehrndz avatar Jul 12 '25 20:07 hurricanehrndz

So I think I know why it is failing to resolve, which I can address. That being said, it seems that the engine never restarts when it fails to connect management or signal, which I think it use to do it in the past

hurricanehrndz avatar Jul 12 '25 20:07 hurricanehrndz

There is also a disconnect from daemon status when you request it, I created a custom ui that would dump the internal status:

{"file":"client_ui.go:619","func":"main.(*serviceClient).updateStatus.func1","level":"info","msg":"isConnectedToPeer: false, isDNSAvailable: false, status: Connected, userRequest: \u0000","time":"2025-07-12T16:07:29-06:00"}

Internal status reports as connected, but then when you run netbird status

OS: darwin/arm64
Daemon version: 0.50.3.1501
CLI version: 0.50.3.1501
Management: Disconnected, reason: rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeout
Signal: Disconnected, reason: rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeout
Relays: 
  [stun:stun.anon-XrBnQ.domain:3478] is Available
  [stun:stun.l.anon-RwAuu.domain:19302] is Available
  [rels://vpn-mgmt-prd-west.anon-eki5N.domain:443/relay] is Unavailable, reason: relay connection is not established
Nameservers: 
  [10.198.1.7:53, 10.197.1.7:53, 10.197.3.7:53] for [corp.anon-eki5N.domain, dev.anon-eki5N.domain, privatelink.anon-eki5N.domain, anon-eki5N.domain] is Unavailable, reason: with udp: read udp 100.122.60.129:0-
>10.197.3.7:53: i/o timeout
FQDN: lh9kcr6djx.vpn-mgmt-prd-west.anon-eki5N.domain
NetBird IP: 100.122.60.129/16
Interface type: Userspace
Quantum resistance: false
Lazy connection: false
Networks: -
Forwarding rules: 0
Peers count: 0/3 Connected

hurricanehrndz avatar Jul 12 '25 22:07 hurricanehrndz

So I was able to create a scenario when you can replicate the misbehavior...

Connect to netbird normally Edit /etc/hosts/: - add relay,management,signal host i.e. 0.0.0.0 netbird.xyz.com place device into sleep by closing lid

you will note that UI will always report as connected, even though the client is disconnected from relay,signal and perhaps managment it reports its internal status over the daemon's GRPC socket as connected.

I was able to fix my DNS issues, by creating a special local resolver using coreDNS to ensure that the management, relay, signal URL are always resolvable. This resulted in the client being more resilient and it actually succeeded almost all the time in connecting backup right after waking from sleep.

hurricanehrndz avatar Jul 12 '25 22:07 hurricanehrndz

After fixing the DNS issues, I can confirm netbird wakes up more sleep pretty successfully

hurricanehrndz avatar Jul 13 '25 13:07 hurricanehrndz

@hurricanehrndz can you test this PR please? It adds a cache layer for the mgmt addresses

lixmal avatar Jul 13 '25 14:07 lixmal

@hurricanehrndz thanks for the extensive debugging and resulting writeup :)

nazarewk avatar Jul 14 '25 10:07 nazarewk

I definitely can test that later today.

On Mon, Jul 14, 2025 at 6:00 AM Krzysztof Nazarewski (kdn) < @.***> wrote:

nazarewk left a comment (netbirdio/netbird#3880) https://github.com/netbirdio/netbird/issues/3880#issuecomment-3068743770

@hurricanehrndz https://github.com/hurricanehrndz thanks for the extensive debugging and resulting writeup :)

— Reply to this email directly, view it on GitHub https://github.com/netbirdio/netbird/issues/3880#issuecomment-3068743770, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMJBTOB5XI3WXXSZJ3RTC33IN5U3AVCNFSM6AAAAAB553AYKWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTANRYG42DGNZXGA . You are receiving this because you were mentioned.Message ID: @.***>

hurricanehrndz avatar Jul 14 '25 10:07 hurricanehrndz

I think it might be better if people fix the misconfiguration if one exists like it did in my environment

hurricanehrndz avatar Jul 17 '25 14:07 hurricanehrndz

I think it might be better if people fix the misconfiguration if one exists like it did in my environment

@hurricanehrndz What misconfiguration did you see/find and how did you fix it? I have read all your comments here and saw your solution, but that didn't sound like a misconfiguration on your side. I am also seeing this problem on some of the clients and have been waiting for a fix for many months now (loving the project in general(and it has been fixed for some of the systems)), so a info about a possible fix would be great 😄

WortmannImpleco avatar Jul 17 '25 15:07 WortmannImpleco

One of our other engineers set the netbird net domain to the same value as the management FQDN.

On Thu, Jul 17, 2025 at 11:10 AM WortmannImpleco @.***> wrote:

WortmannImpleco left a comment (netbirdio/netbird#3880) https://github.com/netbirdio/netbird/issues/3880#issuecomment-3084448417

I think it might be better if people fix the misconfiguration if one exists like it did in my environment

@hurricanehrndz https://github.com/hurricanehrndz What misconfiguration did you see/find and how did you fix it? I have read all your comments here and saw your solution, but that didn't sound like a misconfiguration on your side. I am also seeing this problem on some of the clients and have been waiting for a fix for many months now (loving the project in general(and it has been fixed for some of the systems)), so a info about a possible fix would be great 😄

— Reply to this email directly, view it on GitHub https://github.com/netbirdio/netbird/issues/3880#issuecomment-3084448417, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMJBTKVY4KDCZY66SM4PXL3I64HPAVCNFSM6AAAAAB553AYKWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTAOBUGQ2DQNBRG4 . You are receiving this because you were mentioned.Message ID: @.***>

hurricanehrndz avatar Jul 17 '25 18:07 hurricanehrndz

One of our other engineers set the netbird net domain to the same value as the management FQDN.

just FYI: I'm not sure when the feature will land, but we do have a task in our internal tracker exactly about cross-validating this with management configuration.

nazarewk avatar Jul 18 '25 07:07 nazarewk

I was able to fix my DNS issues, by creating a special local resolver using coreDNS to ensure that the management, relay, signal URL are always resolvable. This resulted in the client being more resilient and it actually succeeded almost all the time in connecting backup right after waking from sleep.

@hurricanehrndz can you share more details on this? This is still a huge issue for us and netbird disconnects every time on sleep

salarali avatar Jul 29 '25 15:07 salarali

Newer releases fixed this by caching addresses

On Tue, Jul 29, 2025 at 9:50 AM Salar Ali Mumtaz @.***> wrote:

salarali left a comment (netbirdio/netbird#3880) https://github.com/netbirdio/netbird/issues/3880#issuecomment-3133108905

I was able to fix my DNS issues, by creating a special local resolver using coreDNS to ensure that the management, relay, signal URL are always resolvable. This resulted in the client being more resilient and it actually succeeded almost all the time in connecting backup right after waking from sleep.

@hurricanehrndz https://github.com/hurricanehrndz can you share more details on this? This is still a huge issue for us and netbird disconnects every time on sleep

— Reply to this email directly, view it on GitHub https://github.com/netbirdio/netbird/issues/3880#issuecomment-3133108905, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMJBTJQ24YNSYGW4OGYMM33K6J2PAVCNFSM6AAAAAB553AYKWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTCMZTGEYDQOJQGU . You are receiving this because you were mentioned.Message ID: @.***>

hurricanehrndz avatar Jul 30 '25 01:07 hurricanehrndz

Test new versions and report back, also it takes 15s or so for it to connect back

On Tue, Jul 29, 2025 at 9:50 AM Salar Ali Mumtaz @.***> wrote:

salarali left a comment (netbirdio/netbird#3880) https://github.com/netbirdio/netbird/issues/3880#issuecomment-3133108905

I was able to fix my DNS issues, by creating a special local resolver using coreDNS to ensure that the management, relay, signal URL are always resolvable. This resulted in the client being more resilient and it actually succeeded almost all the time in connecting backup right after waking from sleep.

@hurricanehrndz https://github.com/hurricanehrndz can you share more details on this? This is still a huge issue for us and netbird disconnects every time on sleep

— Reply to this email directly, view it on GitHub https://github.com/netbirdio/netbird/issues/3880#issuecomment-3133108905, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMJBTJQ24YNSYGW4OGYMM33K6J2PAVCNFSM6AAAAAB553AYKWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTCMZTGEYDQOJQGU . You are receiving this because you were mentioned.Message ID: @.***>

hurricanehrndz avatar Jul 30 '25 01:07 hurricanehrndz

I will test windows sleep soon

hurricanehrndz avatar Jul 30 '25 02:07 hurricanehrndz

Still have to disconnect and reconnect Netbird 0.52.2 on my laptop, after the screen got locked, when I come back to it after a few minutes. Can't access smb-shares after WIndows wakes up again.

Nordlicht-13 avatar Aug 01 '25 09:08 Nordlicht-13