flux-classic
flux-classic copied to clipboard
Make agents and listeners resilient to etcd being unavailable
At the minute, the various listeners (agent, balagent, balancer) will crash if they cannot establish or lose a connection to etcd. This makes some problems for both starting the infrastructure and for making changes, since it's very ordering dependent, and failures are not very obvious.
It would be better if they at least retried for a while and continued operating otherwise. For the listeners, this isn't so bad, since they are just reacting to things. For the agent, it might mean waiting to do a reconciliation when etcd becomes available.
@dpw Do you consider this fixed?
Shrug. Do you?