flux-classic icon indicating copy to clipboard operation
flux-classic copied to clipboard

Make agents and listeners resilient to etcd being unavailable

Open squaremo opened this issue 9 years ago • 2 comments

At the minute, the various listeners (agent, balagent, balancer) will crash if they cannot establish or lose a connection to etcd. This makes some problems for both starting the infrastructure and for making changes, since it's very ordering dependent, and failures are not very obvious.

It would be better if they at least retried for a while and continued operating otherwise. For the listeners, this isn't so bad, since they are just reacting to things. For the agent, it might mean waiting to do a reconciliation when etcd becomes available.

squaremo avatar Jan 08 '16 11:01 squaremo

@dpw Do you consider this fixed?

squaremo avatar Apr 29 '16 09:04 squaremo

Shrug. Do you?

dpw avatar Apr 29 '16 09:04 dpw