netbird icon indicating copy to clipboard operation
netbird copied to clipboard

`ip route` conflicts and race conditions when resolving best routes in `clientNetwork.getBestRouteFromStatuses()`

Open nazarewk opened this issue 1 year ago • 4 comments

Describe the problem

I have noticed the client-side route selection feature so during today's Netbird workshop for our team I have enabled both routes clashing on 10.4/16 and later today teammates started reporting problems with accessing the primary system so I switched the secondary off and dug into the code.

First thing I have noticed using ip route metric was superseded by:

The method is scoped to *clientNetwork, which I guess does not consider a possibility of another NetworkID providing the same CIDR (10.4/16 in our case) resulting in a conflict/race condition on the created ip route entry.

To Reproduce

Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior

Either of:

  • only the best/highest metric route across all networks is selected
  • create the ip route with the metric argument (seems to be netlink.Route.Priority) which would properly prioritize routes on the ip route level even in case of a clash
    • watch our not to deregister the wrong route during updates

Are you using NetBird Cloud?

Yes

NetBird version

0.27.7

NetBird status -d output:

If applicable, add the `netbird status -d' command output.

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

nazarewk avatar May 28 '24 12:05 nazarewk

I've noticed that I was connected to the primary network and the ip route entry disappeared completely after i disabled the secondary network in the web ui.

I had to do netbird down && netbird up for the client to re-register the primary route.

nazarewk avatar May 28 '24 13:05 nazarewk

a single clientNetwork seems to represent a very specific <network-id>-<network-range> grouping:

  1. https://github.com/netbirdio/netbird/blob/f176807ebee751289382202569bf6874105e111f/client/internal/routemanager/manager.go#L46-L46
  2. https://github.com/netbirdio/netbird/blob/f176807ebee751289382202569bf6874105e111f/route/hauniqueid.go#L8-L10

making it impossible to know about other input.NetID providing the same input.Network.String() on completely different site/datacenter

nazarewk avatar May 28 '24 13:05 nazarewk

theoretically using input.Network.String() instead of HAUniqueID here could help with my issue: https://github.com/netbirdio/netbird/blob/f176807ebee751289382202569bf6874105e111f/client/internal/routemanager/manager.go#L243-L251 but I am not sure about the wider consequences

nazarewk avatar May 28 '24 13:05 nazarewk

I've noticed that I was connected to the primary network and the ip route entry disappeared completely after i disabled the secondary network in the web ui.

I had to do netbird down && netbird up for the client to re-register the primary route.

@mlsmaycon noted that this part of the issue (removing routes still in use) will be fixed by https://github.com/netbirdio/netbird/pull/1943 , since it is a big refactor of (previously) linked code in this issue it will be worth analyzing and revisiting again after #1943 is merged

nazarewk avatar May 28 '24 14:05 nazarewk