nomad
nomad copied to clipboard
Purging a parameterized job does not purge or unlink children jobs
Nomad version
0.10.2
Operating system and Environment details
MacOS, dev agent
Issue
When purging a parameterized job, all children jobs of the parameterized job will still maintain parent references to the now purged job. This creates broken references which complicates any time of job traversal.
Reproduction steps
- Run any ol' parameterized job.
- Dispatch some instances (children) of the parameterized job
- Purge the parameterized job (
nomad stop -purge my-parameterized-job
) - Observe that the child job is still there in the CLI and API responses.
ID Type Priority Status
geocoder batch/parameterized 50 running
geocoder/dispatch-1579658083-5adaf751 batch 50 dead
becomes
ID Type Priority Status
geocoder/dispatch-1579658083-5adaf751 batch 50 dead
with an API response including
"ID": "geocoder/dispatch-1579658083-5adaf751",
"ParentID": "geocoder",
"Name": "geocoder/dispatch-1579658083-5adaf751",
What was expected
One of two things should have happened.
1. The child job should have also been purged
Since the job was already in a terminal state, this would have been the same effect as a GC and it would have kept the job graph tidy.
This gets more complicated when there are running instances of the parameterized job, but hey, purge means purge, right?
2. The child job should have been unreferenced from the parent
As part of the purge, the children of a job can be walked and unlinked from the parent. This is just a change in metadata. Child jobs are still just jobs as far as the scheduler is concerned, but in this way, the job graph isn't left in a broken state.
Was this forgotten, are there any changes or plans for this bug? It honestly looks kind of embarrassing to suddenly see over 2000 dead jobs and having no way to remove them...
Edit: For anyone who might face the same problem and doesn't want to purge every job by hand, you should be able to purge all of them with this small script:
#!/bin/bash
nomad status | awk '/^'${1}'/' | awk '{ print $1 }' | while read line
do
nomad stop -purge ${line}
done
Save it as a file (for example purge-periodic-jobs.sh
) make it executable and insert the name of the parent job.
Example: ./purge-periodic-jobs.sh name-of-the-batch-job-to-purge
I'm still seeing this in Nomad 1.2.8+ent. We use namespaced deploys to allow for review-app style testing and thus we have a ton of child jobs that we need to purge now.
Is possible to prevent the purge of the job?