netbird
netbird copied to clipboard
`ip route` conflicts and race conditions when resolving best routes in `clientNetwork.getBestRouteFromStatuses()`
Describe the problem
I have noticed the client-side route selection feature so during today's Netbird workshop for our team I have enabled both routes clashing on 10.4/16 and later today teammates started reporting problems with accessing the primary system so I switched the secondary off and dug into the code.
First thing I have noticed using ip route metric was superseded by:
- best peer selection algorithm, which takes
Route.Metricas the most significant scoring, - using dedicated routing table
7120akanetbird(the one I discovered in https://github.com/netbirdio/netbird/issues/2023 ) - not providing a
metricparameter to createdip routeat all
The method is scoped to *clientNetwork, which I guess does not consider a possibility of another NetworkID providing the same CIDR (10.4/16 in our case) resulting in a conflict/race condition on the created ip route entry.
To Reproduce
Steps to reproduce the behavior:
- Go to '...'
- Click on '....'
- Scroll down to '....'
- See error
Expected behavior
Either of:
- only the best/highest metric route across all networks is selected
- create the
ip routewith themetricargument (seems to benetlink.Route.Priority) which would properly prioritize routes on theip routelevel even in case of a clash- watch our not to deregister the wrong route during updates
Are you using NetBird Cloud?
Yes
NetBird version
0.27.7
NetBird status -d output:
If applicable, add the `netbird status -d' command output.
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
Add any other context about the problem here.
I've noticed that I was connected to the primary network and the ip route entry disappeared completely after i disabled the secondary network in the web ui.
I had to do netbird down && netbird up for the client to re-register the primary route.
a single clientNetwork seems to represent a very specific <network-id>-<network-range> grouping:
- https://github.com/netbirdio/netbird/blob/f176807ebee751289382202569bf6874105e111f/client/internal/routemanager/manager.go#L46-L46
- https://github.com/netbirdio/netbird/blob/f176807ebee751289382202569bf6874105e111f/route/hauniqueid.go#L8-L10
making it impossible to know about other input.NetID providing the same input.Network.String() on completely different site/datacenter
theoretically using input.Network.String() instead of HAUniqueID here could help with my issue:
https://github.com/netbirdio/netbird/blob/f176807ebee751289382202569bf6874105e111f/client/internal/routemanager/manager.go#L243-L251
but I am not sure about the wider consequences
I've noticed that I was connected to the
primarynetwork and theip routeentry disappeared completely after i disabled thesecondarynetwork in the web ui.I had to do
netbird down && netbird upfor the client to re-register theprimaryroute.
@mlsmaycon noted that this part of the issue (removing routes still in use) will be fixed by https://github.com/netbirdio/netbird/pull/1943 , since it is a big refactor of (previously) linked code in this issue it will be worth analyzing and revisiting again after #1943 is merged