nomad
nomad copied to clipboard
Task getting killed with OOM error is marked as complete
Nomad version
Nomad v1.5.2
Operating system and Environment details
Running on AWS.
Issue
We have various batch jobs running on NOMAD which runs on EC2 instances. Now we are connecting up Airflow to Nomad, so we don't want Nomad to handle restarts and reschedules but for this we want to accurately know if a job completed or failed.
This mostly works, but I am seeing on OOM errors that Nomad marks the job as complete.
Expected Result
- If a job fails due to Nomad killing it, it should not be marked as complete.
- Alternatively how do we determine if it was killed due to OOM.
- Also, even though we have
reschedule
andrestart
blocks set to 0, Nomad is still trying to run the job again.
reschedule {
attempts = 0
unlimited = false
}
restart {
attempts = 0
mode = "fail"
}
Actual Result
Nomad marks the job as complete and restarts the job.