swarmkit
swarmkit copied to clipboard
"Dispatcher has Stopped" when some packet loss occurs
Is there a way to start up the Dispatcher again, other than restarting all the managers then all the workers?
If there is intermittent packet loss in one of our datacenters, it seems the entire Swarm cluster goes into a state of disrepair with the dispatcher being offline... I'm a little tired of having to restart all of them every time this happens.
Turns out the solution is to restart all the manager nodes, but do the leader last.
If you restart the leader in the middle of the process, then all the nodes will start to push a "no installed keys can decrypt this message" kind of error, in which case ALL nodes (workers and managers) must be rebooted.