swarmkit icon indicating copy to clipboard operation
swarmkit copied to clipboard

"Dispatcher has Stopped" when some packet loss occurs

Open lukeescude opened this issue 4 years ago • 1 comments

Is there a way to start up the Dispatcher again, other than restarting all the managers then all the workers?

If there is intermittent packet loss in one of our datacenters, it seems the entire Swarm cluster goes into a state of disrepair with the dispatcher being offline... I'm a little tired of having to restart all of them every time this happens.

lukeescude avatar May 26 '20 15:05 lukeescude

Turns out the solution is to restart all the manager nodes, but do the leader last.

If you restart the leader in the middle of the process, then all the nodes will start to push a "no installed keys can decrypt this message" kind of error, in which case ALL nodes (workers and managers) must be rebooted.

lukeescude avatar Oct 26 '20 09:10 lukeescude