swarmkit
swarmkit copied to clipboard
[Feature Request] Add option to auto-rm departed nodes
Currently, a worker node that has voluntarily left a swarm will be marked as DOWN
. This is documented here:
The node will still appear in the node list, and marked as down. It no longer affects swarm operation, but a long list of down nodes can clutter the node list. To remove an inactive node from the list, use the node rm command.
https://docs.docker.com/engine/reference/commandline/swarm_leave/#extended-description
This is a request to add an option in swarmkit that would auto-remove nodes that have correctly left a swarm when the departing node leaves in an orderly fashion with docker swarm leave
. (Also fine if that's the default behavior, if appropriate).
@friism can you explain why you need this?
This seems like it should be default. What is the current procedure for removing a node in this manner?
@stevvooe instinctively I agree, but I also understand why it might be confusing for a user looking at a manager that worker nodes can just up and leave with no immediate trace in docker node ls
.
Currently, in the manually managed case, a user would:
- attempt to drain worker node
- (probably optional)
docker swarm leave
on the worker node - on manager
docker node rm
to evict the nowDOWN
node
In the automated case, this is where things get annoying. Eg. assume an AWS instance receives a termination notice for itself (for whatever reason), ideally it would just be able to leave the swarm without leaving a DOWN
tombstone in the docker node ls
, at least as an option (although again, I agree that it might even be an ok default). Currently, any automation or management software running on top of swarm also has to docker node rm
the departed node to remove it from the node list.
@friism I am starting to recall why we didn't have this in the first place. Moving a node status is a cluster management operation that cannot be performed from the node that initiates a graceful leave.
@stevvooe sorry if this is naive, but why can't the node send a kthxbye
message to the leader and then exit gracefully?
@friism There were some complexities around resolving that state correctly, but I don't remember the details. It may have been a concern about a malicious node poisoning other nodes.
There's also an earlier discussion from last year on this topic here: https://github.com/moby/moby/issues/24088
/cc @diogomonica @tiborvass
What I see people doing in the wild, in cases like a AWS "worker ASG", is they end up setting a daily cron on manager to remove any worker nodes marked down 😂 , which is obviously not the goal here with the existing two-step process, and creates edge cases of removing nodes that might only be offline temporarily, etc.
If an authenticated worker were to request removal as part of its swarm leave
process, how would that risk poisoning other nodes @stevvooe, just curious?
they end up setting a daily cron on manager to remove any worker nodes marked down 😂
Perhaps having some way for nodes to indicate they explicitly left the swarm, and having a docker node prune
command would help for those setups. (I do see it not being entirely matching the original design of managers being in control over nodes / nodes leaving)
From a UX perspective, my ideal would be to have a docker node create
command (which would (e.g.) integrate with infrakit) to create and add nodes to the swarm; allowing scaling the cluster (even scriptable). But that's probably a bit out of scope for this discussion
I like the idea of maybe an additional Availability
option of removed
or left
. I guess the benefit of that over docker swarm --leave
also causing an automatic docker node rm
is that you keep the history in the node list.
Also, for those of us wanting this all auditable, the Availability
actions (active/pause/drain) already produce docker events
when changed, so an additional option that was kicked off by docker swarm --leave
would presumably also show up in events.
For node list cleanup, the docker node prune
is 🤘.
Do we have something similar to docker node prune
? Maybe filter of left nodes on node ls
?
Can we please have this feature or be able to talk to the cluster as a manager without having to join. It's a major problem for large enterprise systems and this methodology of joining as manager is not scalable because a server can only be a manager of one cluster at a time. It might sound logical if you are doing system admin and technically I can only do one thing at a time? but what if I want to compare two clusters settings? What if I want to build a deployment pipeline my deployment nodes can only deploy to one cluster at a time? Really? The thinking about joining as a manager needs to be rethought if you want enterprises to adopt docker swarm.
+1