anycast_healthchecker
anycast_healthchecker copied to clipboard
Feature request: withdraw advertisements on shutdown
Is there a way to tell anycast-healthchecker to withdraw all announcements on a clean shutdown? Similar to purge_ip_prefixes
but on exit?
My scenario is, that I want to be able to perform some maintenance without interrupting any service to much. Routers may take some time for announcements to converge. So if I shut down any healthchecked service it takes a few seconds before the healthchecker notices the service's unavailability and then again some time until the traffic no longer hits the system.
For a smooth transition my approach is to first withdraw all the routes on a system before shutting down any service.
Doing so by shutting down the anycast-healtchecker looks the cleanest to me. Everything else I can think of would be messing with the healthchecker and probably result in attempts by it to fix the configuration.
For your use-case the easiest and fastest way is to stop the bird daemon. It will yield what you want. Bird daemon is stopped during the shutdown process, so you don't need to do much with anycast-healthchecker.
My NOC people get a little twitchy when BGP sessions are down, so I'd avoid taking them down for most of my use scenarios.
Bird daemon is stopped during the shutdown process, so you don't need to do much with anycast-healthchecker.
Yes, but that may cause service interruption as described above. Routes may not have converged into the routers' ASICs and traffic may still hit the machine while no service is up for responding.
From my point of view, an additional parameter on the checks would do the trick. Probably on_exit
, similar to on_disabled
.
on_exit => "withdraw"
-> disable ip_prefix
on exit. This requires itterating all checks in the shutdown
method.
If you don't see any problem with this I'll try to put it into code (albeit python not really being my native language) and start a PR.
My NOC people get a little twitchy when BGP sessions are down, so I'd avoid taking them down for most of my use scenarios.
I never had a problem with this approach and if NOC is having issues when a BGP session is terminated then something is wrong, terminating a BGP session is a normal operational task and it shouldn't cause troubles, only an alert.
Bird daemon is stopped during the shutdown process, so you don't need to do much with anycast-healthchecker.
Yes, but that may cause service interruption as described above. Routes may not have converged into the routers' ASICs and traffic may still hit the machine while no service is up for responding.
You can avoid this scenario with correct systemd ordering for Bird systemd service. I have had bird configured to start last on boot and stopped first on shutdown to avoid the scenario you describe.
From my point of view, an additional parameter on the checks would do the trick. Probably
on_exit
, similar toon_disabled
.
on_exit => "withdraw"
-> disableip_prefix
on exit. This requires itterating all checks in theshutdown
method. If you don't see any problem with this I'll try to put it into code (albeit python not really being my native language) and start a PR.
Having on_exit
parameter per service check makes sense, it should have a default value of none
which does anything.
I will try to cook something this weekend, let's see if I manage to find time for it.
I never had a problem with this approach and if NOC is having issues when a BGP session is terminated then something is wrong, terminating a BGP session is a normal operational task and it shouldn't cause troubles, only an alert.
You are right, but the alert is something I'd like to avoid for most use cases.
Having
on_exit
parameter per service check makes sense, it should have a default value ofnone
which does anything.I will try to cook something this weekend, let's see if I manage to find time for it.
I cobbled together a pull request but my python is far from any good. It's mostly copy/paste from your existing code with some stackoverflow sprinkled over it. It works but it's probably not very clean python. Feel free to adjust my code where there are more elegant ways.