consul-alerts
consul-alerts copied to clipboard
Consul-alerts fails if Consul cluster is unavailable on startup
If I specify a --consul-addr pointing to a Consul node that is down, I get:
INFO[0000] Cluster has no leader or is unreacheable. Get http://10.0.42.1:8500/v1/status/leader?dc=dc1: dial tcp 10.0.42.1:8500: connection refused
.. and consul-alerts dies. This at the very least ought to be documented behaviour. However I don't think it's very nice behaviour even if documented.
I'd propose the following:
- Allow "bootstrapping" notifier configuration via a JSON config file or command line options to be able to obtain a basic configuration even if Consul can't be reached.
- Allow specifying multiple consul nodes on command line and try each before bailing out.
I agree that this should be documented behaviour. Though I am not sure it requires remediation.
Even if we had a default config loaded, the runtime of consul-alerts is dependent on consul for storage and leader election, so it would also require a dummy alternative for all of that.
I think it is better to have multiple consul-alerts instances running and when a consul/consul-alerts pair fail it just elects the next leader and carries on.