nomad
nomad copied to clipboard
Stop system job after disconnect with server for some time
Proposal
Implement stop_after_client_disconnect
for system jobs.
Use-cases
I use Nomad to start a health check responder agent on a set of Azure VMSS clients, so Azure can detect if Nomad client and has been started and the driver is working. But when Nomad client fail to connect to the server afterwards, there is no way to kill the health check responder agent automatically, resulting in the auto recovery mechanism unable to kick in.
Attempted Solutions
Check for Nomad client errors in the health check program. However the client might not expose a HTTP API making this impossible to implement.
maybe related: https://github.com/hashicorp/nomad/pull/19886
Hi @Jamesits and thanks for raising this feature request. We think this makes sense and would be a good thing to support. I'll get it added to our backlog.