netbird icon indicating copy to clipboard operation
netbird copied to clipboard

Netbird interface keeps flapping up and down

Open iball opened this issue 1 year ago • 13 comments

Describe the problem

Netbird interface keeps flapping up and down on Windows and Linux.

To Reproduce

Steps to reproduce the behavior:

  1. Install Netbird
  2. Watch it flap by constantly refreshing network interfaces at the command line.

Expected behavior

I didn't expect v28 to blow everything up. It's preventing me from getting to my services.

Are you using NetBird Cloud?

Yes.

NetBird version

27.10 on Windows but it does the same no matter which version I install. 28.2 on everything else. Tried 28.2, 28.1, 28.0, and now 27.10 on the same Windows 11 PC but it's the same all over, constantly reconnecting to peers and the WT0 interface keeps disappearing and reappearing. Same behavior on all my Netbird clients.

NetBird status -d output:

If applicable, add the `netbird status -d' command output.

No. All it shows is a list of peers but it's constantly reconnecting to them all.

iball avatar Jun 21 '24 05:06 iball

Hi @iball, can you share debug logs of that peer?

pascal-fischer avatar Jun 21 '24 10:06 pascal-fischer

@iball this might be an issue with the network monitor. You can disable it on the command line for the time being, but please collect the debug logs.

netbird down
netbird up --network-monitor=false

lixmal avatar Jun 21 '24 11:06 lixmal

We began experiencing the same issue once we upgraded from 27.10 to 28.2. I just tested changing the 'network monitor' to false and it appears to be stable at the moment. Unsure what function the network monitor is doing to break the connection.

BizkitX avatar Jun 21 '24 16:06 BizkitX

I ran the process in foreground and was pushing a lot of data through to get it to trigger. I get the following output:

2024-06-22T20:08:50-05:00 INFO client/internal/networkmonitor/monitor_windows.go:131: network monitor: neighbor 10.210.0.1 () is not reachable: unreachable

2024-06-22T20:08:50-05:00 INFO client/internal/engine.go:1476: Network monitor detected network change, restarting engine

2024-06-22T20:08:50-05:00 INFO client/internal/engine.go:252: Network monitor: stopped

And then it restarts the connection. This keeps happening even though nothing is changing even after I stopped the data transfer. With network monitor disabled this doesn't happen.

JonathanHohimer avatar Jun 23 '24 01:06 JonathanHohimer

I noticed the same behavior. Updated from v0.27.10 to v0.28.2 (Windows) and did a netbird up -N

After netbird up --network-monitor=false

everything is working as intended.

Hobby-Student avatar Jun 24 '24 13:06 Hobby-Student

Same Here: netbird up --network-monitor=false solves the issue. Another PC is running just fine.

raptaml avatar Jun 25 '24 10:06 raptaml

Hey guys, the v0.28.3 will fix the issue on Windows with network monitor.

mlsmaycon avatar Jun 25 '24 10:06 mlsmaycon

We released v0.28.3, which fixes the issue with the network monitor on Windows. Please upgrade and enable network monitor with:

netbird down
netbird up -N

mlsmaycon avatar Jun 25 '24 16:06 mlsmaycon

Confirmed, works just like before. Thanks!

raptaml avatar Jun 26 '24 07:06 raptaml

Can confirm the issue is still present in 28.4. netbird up --network-monitor=false still fixes it for me though. What are the consequences of running this? Thanks!

pyfrancoeur avatar Jul 09 '24 23:07 pyfrancoeur

@pyfrancoeur can you enable network monitor for a brief period and run the following command to collect some logs to help us fix the issue?

netbird down
netbird up -N
netbird -A debug for 1m

After the tests are done you can disable the monitor with:

netbird down
netbird up --network-monitor=false

Besides that, any information about your setup will be helpful, e.g., number of active interfaces, OS, and main connection type (wifi or cable)

mlsmaycon avatar Jul 10 '24 07:07 mlsmaycon

So far, this issue has been observed exclusively with Windows Active Directory Domain Controllers (AD DC). I have tested with Windows Server 2016, 2019, and 2022, and the problem appears to be entirely random, without any clear differentiators. Most of the servers have a single NIC, while some have two. All connections are wired.

I configured my Netbird interface to use a random port between 21820 and 33820 to avoid conflicts with dns.exe, which opens ports ranging from 49152 to 65535 (source: Microsoft Security Bulletin MS08-037). This change resolved the issue on many domain controllers; however, some continue to experience significant connection instability. It is important to note that this issue does not affect every domain controller.

I have included the debug archive as requested. Please find it attached for your review and analysis.

Thank you. netbird.debug.1112209487.zip

pyfrancoeur avatar Jul 10 '24 12:07 pyfrancoeur

Could any of you test the network monitor change from https://github.com/netbirdio/netbird/pull/2450?

  1. Grab the binary archive from https://github.com/netbirdio/netbird/actions/runs/10459247214/artifacts/1829061085
  2. Extract windows-packages.zip
netbird service stop
netbird service uninstall
  1. Move netbird.exe from the zip archive to %PROGRAMFILES%/Netbird

netbird service install
netbird service start
netbird down
netbird up --network-monitor=true

lixmal avatar Aug 19 '24 20:08 lixmal

This is in the 0.28.8 release. If it fixes the issue, please close.

lixmal avatar Aug 30 '24 21:08 lixmal

@lixmal I have had no free time to test, but I will soon configure my Laptop accordingly and will report back.

Hobby-Student avatar Aug 31 '24 08:08 Hobby-Student