netbird icon indicating copy to clipboard operation
netbird copied to clipboard

When leveraging highly available routing peers in the same location, we experience route flapping periodically.

Open briemann opened this issue 1 year ago • 2 comments

Describe the problem

We're seeing an issue periodically crop up in our ecosystem where we've setup our routing peers in a high availability pair for each environment and in this issue window, random clients will experience route flapping where assets behind the RP nodes are accessible but extremely slow (due to the route flaps).

We've got a variety of 0.27 windows clients in the fleet that have all been able to replicate this behavior and myself this morning experienced this.

Generally the remedy is to shut the windows service down for a few minutes and start it back up, however this is less than ideal long term as we want non-power users to be leveraging this solution and telling them to do technical steps will fall on deaf ears.

As I suspect we're more of an edge case because of this HA setup, we're going to stop the netbird client on one of the two nodes in each location to try and isolate the issue to see if it's related to the HA pair or if it's client side. I just wanted to open this issue to see what we could supply in the meantime to get clarity from other avenues.

To Reproduce

Due to the nature it's not reproducible on command. Generally it happens on the first connection of the day, 9 our of 10 days logging in will be fine but that one day it won't.

Expected behavior

For the client not to flap when attempting to create a route.

Are you using NetBird Cloud?

No. Self Hosted.

NetBird version

0.27.10

NetBird status -d output:

See attached.

Screenshots

See attached output.log

Additional context

It looks almost like the routemanager has an issue with one of the RPs, sees the RP come online and flops the routes over to the other node because peer has a slightly better score.. I guess the fix for this would be to make sure that peers are aware they are right next to each other and that it's possible to have a slightly different score for the same environment to ensure it doesn't flap? Not sure, maybe I am off-base.

netbird_output.log netbird_status_output.txt

briemann avatar Jun 18 '24 15:06 briemann