[client] Add support for retain existing AllowedIPs when no alternative paths are available
We are using Netbird to manage our WireGuard network (without relay nodes). Our topology is relatively stable and does not change frequently.
We have analyzed potential network instability issues that may occur when either the ICE connection drops or the management API becomes temporarily unavailable. As noted in this comment, in such cases, network routes (e.g., 10.0.0.0/24) are removed from AllowedIPs, effectively cutting off connectivity.
In our scenario, we prefer: • When ICE fails or the management service is temporarily down, the existing network connectivity should remain unaffected. The system should not proactively remove AllowedIPs, especially when no alternative paths are available. • Even if the management service remains operational, the temporary unavailability of routes during ICE reconnection is still unacceptable, as it causes unnecessary and avoidable disruptions.
We propose adding an option to enable a failsafe routing mode, where: • Route manager only updates AllowedIPs when a valid, reachable path is available. • If no valid path is detected, the current AllowedIPs are kept unchanged.
This behavior would help prevent unnecessary network disconnections caused by transient ICE or management issues.
Example environment variable:
NB_ROUTE_STICKY_ON_FAILURE=true
Issue ticket number and link
Stack
Checklist
- [ ] Is it a bug fix
- [ ] Is a typo/documentation fix
- [x] Is a feature enhancement
- [ ] It is a refactor
- [ ] Created tests that fail without the change (if possible)
- [ ] Extended the README / documentation, if necessary
By submitting this pull request, you confirm that you have read and agree to the terms of the Contributor License Agreement.
Quality Gate passed
Issues
0 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
#4228 might broke our changes.
Currently, ICE disconnection does not remove the endpoint, so we only retain the routes. If #4228 is merged, we should also disable endpoint removal.
Hi @pappz @lixmal,
Since #4228 has been merged, I will update this PR to also skip endpoint removal when NB_ROUTE_STICKY_ON_FAILURE is enabled.
At the same time, I’d like to confirm whether this feature aligns with NetBird’s overall direction, and whether you see any concerns with this approach.
In our environment, maintaining stable connectivity during temporary management API outages is essential. When the management service is unavailable, clients are unable to re-establish ICE connections, and such ICE disconnections can naturally occur for several hours under our deployment conditions (as also described in the related issue). Because of this, preserving the existing AllowedIPs and endpoints becomes critical to avoid unnecessary disruptions.
Thanks, and I appreciate any guidance or suggestions you can provide.