netbird icon indicating copy to clipboard operation
netbird copied to clipboard

[client] Add support for retain existing AllowedIPs when no alternative paths are available

Open gamerslouis opened this issue 7 months ago • 3 comments

We are using Netbird to manage our WireGuard network (without relay nodes). Our topology is relatively stable and does not change frequently.

We have analyzed potential network instability issues that may occur when either the ICE connection drops or the management API becomes temporarily unavailable. As noted in this comment, in such cases, network routes (e.g., 10.0.0.0/24) are removed from AllowedIPs, effectively cutting off connectivity.

In our scenario, we prefer: • When ICE fails or the management service is temporarily down, the existing network connectivity should remain unaffected. The system should not proactively remove AllowedIPs, especially when no alternative paths are available. • Even if the management service remains operational, the temporary unavailability of routes during ICE reconnection is still unacceptable, as it causes unnecessary and avoidable disruptions.

We propose adding an option to enable a failsafe routing mode, where: • Route manager only updates AllowedIPs when a valid, reachable path is available. • If no valid path is detected, the current AllowedIPs are kept unchanged.

This behavior would help prevent unnecessary network disconnections caused by transient ICE or management issues.

Example environment variable:

NB_ROUTE_STICKY_ON_FAILURE=true

Issue ticket number and link

Stack

Checklist

  • [ ] Is it a bug fix
  • [ ] Is a typo/documentation fix
  • [x] Is a feature enhancement
  • [ ] It is a refactor
  • [ ] Created tests that fail without the change (if possible)
  • [ ] Extended the README / documentation, if necessary

By submitting this pull request, you confirm that you have read and agree to the terms of the Contributor License Agreement.

gamerslouis avatar Jul 30 '25 03:07 gamerslouis

#4228 might broke our changes.

Currently, ICE disconnection does not remove the endpoint, so we only retain the routes. If #4228 is merged, we should also disable endpoint removal.

gamerslouis avatar Jul 30 '25 08:07 gamerslouis

Hi @pappz @lixmal,

Since #4228 has been merged, I will update this PR to also skip endpoint removal when NB_ROUTE_STICKY_ON_FAILURE is enabled.

At the same time, I’d like to confirm whether this feature aligns with NetBird’s overall direction, and whether you see any concerns with this approach.

In our environment, maintaining stable connectivity during temporary management API outages is essential. When the management service is unavailable, clients are unable to re-establish ICE connections, and such ICE disconnections can naturally occur for several hours under our deployment conditions (as also described in the related issue). Because of this, preserving the existing AllowedIPs and endpoints becomes critical to avoid unnecessary disruptions.

Thanks, and I appreciate any guidance or suggestions you can provide.

gamerslouis avatar Dec 11 '25 09:12 gamerslouis