Question: Temporary taking node out of the cluster
I have the following setup: multiple servers and multiple nodes, that are tagged as "worker" and execute jobs.
I want to be able from time to time to take one of the workers out of the cluster, so that temporarily jobs are not scheduled on it, and then be able to bring it back.
Currently, it seems that the only way to do that is to stop the dkron service.
I explored the options:
- using /leave -> but then there is no /join to come back
- using /tags -> doesn't seem possible to modify node tags at runtime; also, adding multiple tags like "worker", "active" would not work, since scheduling takes any of the tag combinations.
Any suggestions?
Hi @alexef I'm trying to understand the motivation you have to have Dkron running when taking a node out of the cluster?
In my use case, I have long running machines that have many services running, dkron being only one of them.
I configured other services to stop accepting requests, when consul goes in maintenance mode. When consul comes back, they automatically start receiving traffic. I imagine the same behaviour for the dkron agent: pause getting tasks, and then automatically resume.
@alexef I'm also encountering this issue. I cannot find a way to drain a Dkron node of running jobs before stopping it for maintenance.
This question can actually be extended to how to dynamically add and delete Dkron tags. Use tags to control whether the node is within the scheduled range. Refer to: #791
like #808 this should have been fixed by #983