Nomad server leave automatically but not permanently
Proposal
Have Nomad servers (and potentially clients?) notify the leader whenever they interrupt/terminate. Something like leave_on_interrupt but then less permanent.
For follower-servers, we have to wait a while for the leader to pick up on the follower being offline.
Currently I'm not sure whether leader-servers already do this, but I'd imagine something like nomad operator step-down (doesn't exist in Nomad, but does in Vault) would be useful to have Nomad do on shutdown.
In cases where we know the server will be offline for a while (e.g. reboot), there's no need to wait for timeouts/heartbeats.
Use-cases
- Checking server-health more reliably (e.g. when deciding "Can I, a follower node, reboot right now?`)
Attempted Solutions
- Waiting about 5-10 seconds to see
Alivechange inLeft.
Hi @EtienneBruines! I just want to make sure I understand the scenario and goal here. In the leave_on_interrupt case today, the reason why it's expected to be permanent is because leaving causes a memberlist and Raft reconfiguration. That's costly, so we recommend against it if you know that server is coming back anyways. (Ex. you don't have an immutable infrastructure thing going where the server is entirely replaced in order to change its config.)
In your scenario, you've got a follower that's restarting or rebooting (for upgrades or whatever) and will come back. What does "leaving" mean here if not the same memberlist and Raft reconfiguration. Or in other words, sending a signal to the leader isn't hard, but what is it that you want the leader to do with this information? If you're looking to coordinate between servers to orchestrate configuration changes across the cluster (ex. host reboots for kernel upgrades), you'd probably be better off querying the host directly via the Agent Health endpoints. That way you can detect a crashed host too.
Thank you for your reply, @tgross!
If you're looking to coordinate between servers to orchestrate configuration changes across the cluster (ex. host reboots for kernel upgrades), you'd probably be better off querying the host directly via the Agent Health endpoints. That way you can detect a crashed host too.
That is not a bad idea!
Or in other words, sending a signal to the leader isn't hard, but what is it that you want the leader to do with this information?
If I execute nomad server members (or its -json equivalent), it not only lists the servers but also a status (e.g. Alive or Left). Having it report a status of Offline or Lost (not quite sure what expected values are) would not be a bad thing. But I don't know whether it just always reports Alive whenever it's in the memberlist?
But I don't know whether it just always reports
Alivewhenever it's in the memberlist?
The possible statuses you can see there are:
-
alivefor a server that's joined the memberlist and making serf heartbeats -
leavingfor a server that's started to leave but this hasn't been broadcasted to everyone yet (this state should be brief) -
leftfor a server that's left the cluster -
failedfor a server that's missed heartbeats -
nonefor a server that's showing up in the memberlist but hasn't actually joined (I'm honestly not sure without some more research how you'd ever get into this state).
The nomad server members command is surfacing the information from serf directly without transforming it, so this feature would effectively be an overlay on that information so that we're changing "alive" to "offline" (or something like that) temporarily for the operator but not doing anything with that information in Nomad itself. Or maybe we don't even change the serf.Member.Status but add tags to the serf.Member data, that way we're using the existing broadcast of memberlist data without messing with the existing API.
This is an interesting idea... maybe there's other metadata we could allow operators to stick on there too. I'll mark this for roadmapping and further thinking.