netbird icon indicating copy to clipboard operation
netbird copied to clipboard

Incorrect `status` reported with `netbird status -d`

Open soakes opened this issue 4 months ago • 9 comments

Today I have experenced a strange issue with netbird version 0.27.0 for Debian 12. During the night several of the VPN links went offline (as can see below) and this morning the links were dead. However, whats intresting is that netbird status -d is showing that they are connected when they actually are offline.

The time right now is Sun Apr 7 07:42:49 UTC 2024 which is when this snapshot below was taken. If you see the Last WireGuard handshake was around 2024-04-07 01:06:23. Considuring that the keepalive should be I think around 10-60 seconds if I recall, the status of the link should say its down. I did confirm that you could not see the VPN endpoint (netbird.cloud address).

After restarting netbird, brought up all the links correctly and the status is no longer lying (says connected and is true now).

I believe for whatever reason netbird sent an update to the client which it coudln't do (for whatever reason) and then gave up trying or even attempted to restart the link.

I believe that if the endpoint addresses (netbird IP) can't be reached, then the status should change from connected to something else (offline maybe?). This will allow monitoring to be done on the links and restart accordingly.

It might also be an idea to adjust netbird so that it trys and restart the links if they have been down for a while as in most cases this is all thats required.

To Reproduce

I am not sure how you could reproduce as I am not sure why the links failed, but my theory is, you could probbaly setup a firewall rule to drop the packets to simulate the line being down and then check the status has updated.

Expected behavior

I would expect the status on the links to apear to be down (offline). Ideally it would be nice if they could be restarted upto X retrys as by the looks of it, it doesn't even try if there is a glitch in the network.

Are you using NetBird Cloud?

I am using NetBird Cloud control plane which is deployed as of Sun 7th April 2024.

NetBird version

netbird version 0.27.0

NetBird status -d output:

Peers detail:
 xxxx.netbird.cloud:
  NetBird IP: 100.xx.xx.130
  Public key: DRh4eyiGdfy**********************
  Status: Connected
  -- detail --
  Connection type: P2P
  Direct: true
  ICE candidate (Local/Remote): host/prflx
  ICE candidate endpoints (Local/Remote): 10.xx.xx.254:51820/159.xxx.xxx.10:51820
  Last connection update: 2024-04-04 01:08:51
  Last WireGuard handshake: 2024-04-07 01:06:23
  Transfer status (received/sent) 222.1 GiB/80.2 GiB
  Quantum resistance: true
  Routes: 10.xx.xx.xx/16
  Latency: 6.120067ms

 xxx.netbird.cloud:
  NetBird IP: 100.xx.xx.255
  Public key: PVcBjP7Wro*******************
  Status: Connected
  -- detail --
  Connection type: P2P
  Direct: true
  ICE candidate (Local/Remote): host/host
  ICE candidate endpoints (Local/Remote): 10.xx.xx.254:51820/185.xxx.xxx.218:51820
  Last connection update: 2024-04-06 19:09:51
  Last WireGuard handshake: 2024-04-07 01:04:01
  Transfer status (received/sent) 5.7 MiB/6.8 MiB
  Quantum resistance: true
  Routes: -
  Latency: 8.975109ms

 xxxx.netbird.cloud:
  NetBird IP: 100.xx.xx.156
  Public key: TLj1K0BtAV************************
  Status: Connected
  -- detail --
  Connection type: P2P
  Direct: true
  ICE candidate (Local/Remote): srflx/srflx
  ICE candidate endpoints (Local/Remote): 45.xx.xx.213:37386/90.xx.xx.142:37386
  Last connection update: 2024-04-06 06:48:53
  Last WireGuard handshake: 2024-04-06 08:41:53
  Transfer status (received/sent) 7.9 MiB/13.0 MiB
  Quantum resistance: true
  Routes: 10.xx.xx.0/16
  Latency: 8.165065ms

I have more peers but the above snippet should be enough for an example (two offline, one online). All should be online, several are, several are not.

Screenshots

No screenshot necessary, its netbird status -d that needs adjusting.

soakes avatar Apr 07 '24 08:04 soakes