swarmsible icon indicating copy to clipboard operation
swarmsible copied to clipboard

When upgrading cluster nodes, determine current Leader and upgrade that one last

Open s4ke opened this issue 1 year ago • 0 comments

There seems to be an edge case that might happen when the current leader is upgraded to a newer version first, at least an incident that occured today suggests this.

When upgrading a 3 node cluster from 24.0.0 to 24.0.2 we saw issues where upgrading the leader of the cluster caused a scenario where two members of the cluster had the same name according to the output of docker node ls on the leader at that time.

grafik

When trying to demote the leader at that time, both the leader and the node with the duplicated name got demoted (when doing that by ID instead of name!), causing the swarm to lose quorum.

To prevent issues like this in the future (with the hopes that this can be prevented by proper upgrade order), we should determine the current manager of the swarm and upgrade that node last instead of abusing one of the managers as the "main manager".

s4ke avatar Jun 19 '23 21:06 s4ke