levant
levant copied to clipboard
Checking batch job status fails
I have batch job that perform some one-time short-running task. Successfull deploument looks like:
2022-06-29T16:00:17Z |INFO| levant/deploy: triggering a deployment job_id=some_nomad_job_name
2022-06-29T16:00:18Z |INFO| levant/deploy: evaluation e9d76b4c-8f4b-68e5-05e3-eee20a82d225 finished successfully job_id=some_nomad_job_name
2022-06-29T16:00:18Z |DEBU| levant/job_status_checker: running job status checker for job job_id=some_nomad_job_name
2022-06-29T16:00:18Z |INFO| levant/job_status_checker: job has status running job_id=some_nomad_job_name
2022-06-29T16:00:18Z |INFO| levant/job_status_checker: task command in allocation 124b605d-518e-6292-5cd3-8decc4d033ec now in pending state job_id=some_nomad_job_name
2022-06-29T16:00:27Z |INFO| levant/job_status_checker: task command in allocation 124b605d-518e-6292-5cd3-8decc4d033ec now in running state job_id=some_nomad_job_name
2022-06-29T16:00:27Z |INFO| levant/job_status_checker: all allocations in deployment of job are running job_id=some_nomad_job_name
2022-06-29T16:00:27Z |INFO| levant/deploy: job deployment successful job_id=some_nomad_job_name
Today i'v got error:
2022-07-06T14:57:01Z |INFO| levant/deploy: triggering a deployment job_id=some_nomad_job_name
2022-07-06T14:57:03Z |INFO| levant/deploy: evaluation ffa905f9-e937-e178-2e1a-d2b3d18ed8a8 finished successfully job_id=some_nomad_job_name
2022-07-06T14:57:03Z |DEBU| levant/job_status_checker: running job status checker for job job_id=some_nomad_job_name
2022-07-06T14:57:07Z |ERRO| levant/job_status_checker: job has status dead job_id=some_nomad_job_name
2022-07-06T14:57:07Z |ERRO| levant/deploy: job deployment failed job_id=some_nomad_job_name
In successful deployment time between "levant/job_status_checker: running job status checker for job" and first status is 0 seconds. In failed - 4 seconds. During this time my job was successfully finished and has status 'dead' but levant thinks that this task is just dead so it exited with non zero code and fails by CI pipeline.
As i see, levant have some problems with communication to nomad and its tooks to long time to get job status. Is it possible to disable check of job? because asynchronous checking of short lived tasks may fail unexpectedly
I have the same problem, levant marks deployment as failed because it checks job status, which can be pending
, running
and dead
This status can't tell us about was container or smth else exited successfully or not
hi,
same issue .. I have a one-shot container which creates files and then exit 0 .. but pipeline is marked as failed:
2023-01-18T14:55:03Z |INFO| levant/job_status_checker: task django-collectstatic in allocation dcfac9d2-9a14-f493-bd02-34af173724e3 now in dead state job_id=backoffice_gunicorn
2023-01-18T14:55:04Z |INFO| levant/job_status_checker: task django in allocation dcfac9d2-9a14-f493-bd02-34af173724e3 now in running state job_id=backoffice_gunicorn
2023-01-18T14:55:04Z |INFO| levant/job_status_checker: task nginx in allocation dcfac9d2-9a14-f493-bd02-34af173724e3 now in running state job_id=backoffice_gunicorn
2023-01-18T14:55:04Z |ERRO| levant/deploy: job deployment failed job_id=backoffice_gunicorn
Cleaning up project directory and file based variables
00:00
ERROR: Job failed: exit code 1
cu denny
You can check status of allocation via cli. It works for checking until it won't be fixed
You can check status of allocation via cli. It works for checking until it won't be fixed
Via levant or via Nomad Cli ? Can you give me an example? It sounds for me, that I then need to add an exit 0 and check the state on a separate task.
IDs=($(nomad job allocs -namespace "ns_name" -t '{{ $IDs := . }}{{ range $IDs }}{{ printf .ID }} {{ end }}' "job_name"))
lastID="${IDs[0]}"
status=$(nomad alloc status -namespace "ns_name" -short -t '{{ (index .ClientStatus) }}' "$lastID")
if [[ "$status" != "complete" ]]; then
echo "Job failed check error in logs: $NOMAD_ADDR/ui/allocations/$lastID/job_name-task/logs"
exit 1
else
echo "Job successfully finished"
fi
Also I missed checking while job is running, just add while loop before checking status "complete"