for-aws icon indicating copy to clipboard operation
for-aws copied to clipboard

Swarm broken - "context deadline exceeded"

Open OmpahDev opened this issue 8 years ago • 2 comments

My swarm is completely broken, it happened randomly, and I can't get it to get healthy again.

When I first started noticing that my swarm wasn't working, it showed that one of my managers was down. From then on everything I tried (deploying anything, etc.) failed. Googling the problem told me that this was because my swarm had lost quorum and the way to fix it was to run docker swarm init --force-new-cluster on my leader. However, this command fails.. it hangs for a while and then says Error response from daemon: context deadline exceeded. I get the exact same error when I try to run a docker swarm leave --force as well.

How do I fix this and get my swarm back to healthy? I AM NOT going to delete the CloudFormation stack and re-create, that's WAY too much work, so any solution MUST not involve doing this, or replacing any of my AWS infrastructure in any way.

OmpahDev avatar Oct 10 '17 13:10 OmpahDev

Seems like this is similar to: https://github.com/docker/swarmkit/issues/1340

FrenchBen avatar Oct 10 '17 23:10 FrenchBen

I had a similar issue (not on AWS though). From a healthy manager, I demoted the node being Down and then promoted it again. That solved my issue.

danieljuhl avatar Jun 12 '18 11:06 danieljuhl