netbird icon indicating copy to clipboard operation
netbird copied to clipboard

`netbird up` fails with device auth failure

Open synfinatic opened this issue 1 year ago • 18 comments

Describe the problem

Just did a netbird down followed by a netbird up on a device which was bootstrapped onto the netbird network via a setup key and it will not connect.

To Reproduce

netbird down && netbird up

Expected behavior

Connect to netbird. Don't error out with the following error:

2024-03-02T01:47:37Z WARN client/cmd/root.go:195: retrying Login to the Management service in 1.359509522s due to error rpc error: code = Unknown desc = getting device authorization flow info failed with error: context deadline exceeded
2024-03-02T01:47:49Z WARN client/cmd/root.go:195: retrying Login to the Management service in 2.133556171s due to error rpc error: code = Unknown desc = getting device authorization flow info failed with error: context deadline exceeded
Error: login backoff cycle failed: rpc error: code = Unknown desc = getting device authorization flow info failed with error: context deadline exceeded

Are you using NetBird Cloud?

Yes, using cloud.

NetBird version

0.25.7

NetBird status -d output:

netbird status -d
Daemon status: LoginFailed

Run UP command to log in with SSO (interactive login):

 netbird up

If you are running a self-hosted version and no SSO provider has been configured in your Management Server,
you can use a setup-key:

 netbird up --management-url <YOUR_MANAGEMENT_URL> --setup-key <YOUR_SETUP_KEY>

More info: https://docs.netbird.io/how-to/register-machines-using-setup-keys

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

synfinatic avatar Mar 02 '24 01:03 synfinatic

updated to 0.26.2 and same problem:

$ netbird status
Daemon version: 0.26.2
CLI version: 0.26.2
Management: Disconnected, reason: rpc error: code = FailedPrecondition desc = failed connecting to Management Service : context deadline exceeded
Signal: Disconnected
Relays: 0/0 Available
FQDN:
NetBird IP: N/A
Interface type: N/A
Quantum resistance: false
Peers count: 0/0 Connected

$ netbird up
2024-03-02T02:00:13Z WARN client/cmd/root.go:204: retrying Login to the Management service in 920.398536ms due to error rpc error: code = Unknown desc = getting device authorization flow info failed with error: context deadline exceeded
2024-03-02T02:00:24Z WARN client/cmd/root.go:204: retrying Login to the Management service in 889.796141ms due to error rpc error: code = Unknown desc = getting device authorization flow info failed with error: context deadline exceeded
Error: login backoff cycle failed: rpc error: code = Unknown desc = getting device authorization flow info failed with error: context deadline exceeded

synfinatic avatar Mar 02 '24 02:03 synfinatic

Looks like this was a service issue with netbird.io cloud service as it is working now? Was there a status page I should have looked for system/health status?

$ netbird status
Daemon version: 0.26.2
CLI version: 0.26.2
Management: Connected
Signal: Connected
Relays: 2/2 Available
FQDN: raspi-blue.netbird.cloud
NetBird IP: 100.93.254.165/16
Interface type: Kernel
Quantum resistance: false
Peers count: 1/4 Connected

synfinatic avatar Mar 02 '24 15:03 synfinatic

@synfinatic can you check the logs, especially the /var/log/netbird/netbird.err?

mlsmaycon avatar Mar 02 '24 16:03 mlsmaycon

Sadly, there are no logs anymore since this device is running DietPi and /var/log is on a ramdisk volume which is purged on a regular basis.

synfinatic avatar Mar 02 '24 16:03 synfinatic

Please run the agent in the foreground:

sudo netbird service stop
sudo netbird up -F -l debug -m https://your-server-url:port

mlsmaycon avatar Mar 02 '24 16:03 mlsmaycon

As I stated earlier this morning, the problem seems to have resolved itself. I'm no longer able to reproduce this issue. Should it happen again, I'll be happy to provide the logs... unfortunately the ticket template didn't ask for them and I forgot about the log purging.

However, the limited output seems to indicate an issue with the netbird.io service? Can you confirm there a service issue at that time? I opened the ticket while the problem was occurring.

synfinatic avatar Mar 02 '24 16:03 synfinatic

I just had what appears to the the same error with a client. "context deadline exceeded". This is with a selfhosted netbird. AND I see now that my management seems to be messed up. No peers appear when I log in. Errors from docker show: http: TLS handshake error from xxx.xxx.xxx.xxx:63817: remote error: tls: unknown certificate

The certs for the management server seems to be intact as I can log in. (those are handled by caddy) but I think there are certs for another part of this?

WGandy avatar Mar 03 '24 19:03 WGandy

@WGandy is that a custom docker build?

mlsmaycon avatar Mar 03 '24 19:03 mlsmaycon

As I stated earlier this morning, the problem seems to have resolved itself. I'm no longer able to reproduce this issue. Should it happen again, I'll be happy to provide the logs... unfortunately the ticket template didn't ask for them and I forgot about the log purging.

However, the limited output seems to indicate an issue with the netbird.io service? Can you confirm there a service issue at that time? I opened the ticket while the problem was occurring.

@synfinatic, we didn't have any issues within the timeframe from your logs. The issue could be linked to a latency between the client and management service. I've shared with you some steps in https://github.com/netbirdio/netbird/issues/1618#issuecomment-1975941729 that might help us understand the issue in detail.

mlsmaycon avatar Mar 04 '24 08:03 mlsmaycon

Yes, you helped me get it setup quite a while ago. It seems that the Coturn is not finding the certs. And it's probably since the Caddy container recently re-upped them. I'm wondering if perhaps we manually copied the certs to get it going when we set it up?? I'm hoping to find the time to sort through it later today. Hopefully it's just a volume mapping issue.

WGandy avatar Mar 04 '24 22:03 WGandy

Just a follow up, my failure was on account of Caddy renewing certs with a different CA than it used previously. This resulted in having the cert files located at a different path. The dashboard container was able to use the new certs but the management container did not. I manually changed the cert file names and paths in the docker compose for the management and in the management.json file. If it renews again with the opposite provider then I'll need to manually change it again. But, I think that this will be automated in a future version of Netbird.

WGandy avatar Mar 07 '24 17:03 WGandy

Hi!

I am facing this issue, I can't connect any client. I've installed using Advanced guide, with Authentik and Nginx Proxy Manager. I can login, shows peers page, I can create management Keys, but I cannot connect.

When I debbug shows this below

2024-09-16T12:33:03-03:00 ERRO client/internal/login.go:105: failed while getting Management Service public key: failed while getting Management Service public key
2024-09-16T12:33:03-03:00 WARN client/cmd/root.go:234: retrying Login to the Management service in 1.259689188s due to error failed while getting Management Service public key
2024-09-16T12:33:05-03:00 DEBG client/internal/login.go:93: connecting to the Management service https://vpn.example.com:443
2024-09-16T12:33:05-03:00 DEBG client/internal/login.go:63: connected to the Management service https://vpn.example.com:443
2024-09-16T12:33:05-03:00 ERRO management/client/grpc.go:287: failed while getting Management Service public key: rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 502 (Bad Gateway); transport: received unexpected content-type "text/html"

The setup is 3 VMs:

  1. Nginx Proxy Manager
  2. Authentik
  3. Netbird

All behind one Public IP, but same internal network, all reachable between each. In NPM with host pointing to netbird vm, is the ports 80, 443, and 33073 in configuration, with gRPC, etc...

The other ports required, is forwarding directly to the netbird VM.

There are something that I missed?

juniormarangao avatar Sep 16 '24 15:09 juniormarangao

Hello @synfinatic,

We're currently reviewing our open issues and would like to verify if this problem still exists in the latest NetBird version.

Could you please confirm if the issue is still there?

We may close this issue temporarily if we don't hear back from you within 2 weeks, but feel free to reopen it with updated information.

Thanks for your contribution to improving the project!

nazarewk avatar Apr 28 '25 15:04 nazarewk

2025-05-01T13:00:39+08:00 DEBG client/internal/login.go:94: connecting to the Management service https://api.netbird.io:443 2025-05-01T13:00:39+08:00 DEBG util/net/dialer_dial.go:52: Dialing tcp api.netbird.io:443 2025-05-01T13:00:49+08:00 INFO ./caller_not_available:0: 2025/05/01 13:00:49 WARNING: [core] [Channel #1 SubChannel #2]grpc: addrConn.createTransport failed to connect to {Addr: "api.netbird.io:443", ServerName: "api.netbird.io:443", }. Err: connection error: desc = "transport: authentication handshake failed: EOF" 2025-05-01T13:00:50+08:00 DEBG util/net/dialer_dial.go:52: Dialing tcp api.netbird.io:443 2025-05-01T13:01:00+08:00 INFO ./caller_not_available:0: 2025/05/01 13:01:00 WARNING: [core] [Channel #1 SubChannel #2]grpc: addrConn.createTransport failed to connect to {Addr: "api.netbird.io:443", ServerName: "api.netbird.io:443", }. Err: connection error: desc = "transport: authentication handshake failed: EOF"

jjqtony avatar May 01 '25 05:05 jjqtony

having this from one peer:

Last login: Fri Aug 22 17:08:58 2025 from 10.232.161.204 ale@raspi52:~ $ netbird up Error: daemon up failed: login backoff cycle failed: rpc error: code = Unknown desc = getting device authorization flow info failed with error: context deadline exceeded ale@raspi52:~ $ netbird up Error: daemon up failed: login backoff cycle failed: rpc error: code = Unknown desc = getting device authorization flow info failed with error: context deadline exceeded

morandalex avatar Aug 22 '25 15:08 morandalex

having this from one peer:

Last login: Fri Aug 22 17:08:58 2025 from 10.232.161.204 ale@raspi52:~ $ netbird up Error: daemon up failed: login backoff cycle failed: rpc error: code = Unknown desc = getting device authorization flow info failed with error: context deadline exceeded ale@raspi52:~ $ netbird up Error: daemon up failed: login backoff cycle failed: rpc error: code = Unknown desc = getting device authorization flow info failed with error: context deadline exceeded

Same issue for me

ghaisasadvait avatar Sep 05 '25 23:09 ghaisasadvait

I did some upgrades including changing the port for managment and turn server. Nedless to say I encounted the same issue.

Here is how I solved it.

You'll have to go one by one for each peer that has Session Expiration disabled and re-enable them.

then on the target peer do the following:

sudo netbird service stop sudo netbird up -F -l debug -m https://your-server-url:port

It should prompt for login. Do that then close the session (control + c).

Disable Session Expiration.

Now either run sudo apt update and sudo apt install netbird update to get the latest and re-enable the service or just simply run sudo netbird service start

You should now have this resolved for this peer. Do it for the other ones now :)

lukababu avatar Oct 04 '25 08:10 lukababu

I still have this issue, using self-hosted verison on kubernetes with traefik gateway:

$ sudo netbird up -F -l debug -m https://netbird.test.krd:443 
2025-12-04T17:30:18+03:00 DEBG client/internal/login.go:99: connecting to the Management service https://netbird.test.krd:443
2025-12-04T17:30:18+03:00 DEBG client/net/dialer_dial.go:20: Dialing tcp netbird.test.krd:443
2025-12-04T17:30:18+03:00 DEBG client/internal/login.go:65: connected to the Management service https://netbird.test.krd:443
2025-12-04T17:30:18+03:00 ERRO shared/management/client/grpc.go:277: failed while getting Management Service public key: rpc error: code = Internal desc = server closed the stream without sending trailers
2025-12-04T17:30:18+03:00 ERRO client/internal/login.go:111: failed while getting Management Service public key: failed while getting Management Service public key
2025-12-04T17:30:18+03:00 WARN client/cmd/root.go:245: retrying Login to the Management service in 1.108338522s due to error failed while getting Management Service public key

shkarface avatar Dec 04 '25 14:12 shkarface