k8s-wait-for
k8s-wait-for copied to clipboard
Wait for job does not work as expected
I was expecting for this app to wait until a job completed successfully but it only waited for the job to be ready. Am I misunderstanding something?
This is a portion of my deployment resource and I have verified that my job runs to completion and exits with a status code of 0.
initContainers:
- name: data-migration-init
image: 'groundnuty/k8s-wait-for:v1.7'
args:
- job
- my-data-migration-job
I most definitely use it to wait for a job to be completed. example:
- name: wait-for-onezone
image: {{ .Values.wait_for.image }}
imagePullPolicy: {{ template "imagePullPolicy" dict "root" . "context" .Values.wait_for }}
args:
- "job"
- "{{ template "onezone_name" . }}-ready-check"
Please try image groundnuty/k8s-wait-for:v1.5.1 I have not upgraded my production envs to the newest image. Mabe some bug got into it...
Will do. Thanks to the quick response.
Version 1.5.1 works as expected.
I'm not in production yet so I'm willing to help isolate the issue. I'll try a 1.6 version tomorrow and let you know the results.
Had a hunch and it was right that's a diff between kubectl describe job <> between kubectl v1.24.0 and v1.25.2:
< Start Time: Wed, 21 Sep 2022 11:03:23 +0200
< Pods Statuses: 1 Active / 0 Succeeded / 0 Failed
---
> Start Time: Wed, 21 Sep 2022 09:03:23 +0000
> Pods Statuses: 1 Running / 0 Succeeded / 0 Failed
They changed Running to Active... not sure how it could break the code yet, since it uses regexp-es that should be ok with that...
Version 1.6 does not work.
I diff'd wait_for.sh and don't see anything that would change its behavior.
v1.5.1 uses kubectl 1.21.0 and v1.6 uses kubectl 1.24.0 so there is probably a change there.
noroot-v1.7 running on K8S 1.25 has the same issue and doesnt wait for the job to be successful.
Switched to v1.5.1 and works as expected. Would be nice to be runnng the noroot version :)
Got hit by this as well, switched to v1.5.1 works as expected now.
Also got hit by this in v1.7, is someone working on a fix?
I found the problem. After all, the regexp was not working after k8s changed this:
Pods Statuses: 0 Running / 1 Succeeded / 0 Failed
Pods Statuses: 1 Active (0 Ready) / 0 Succeeded / 0 Failed
The change is connected with feature gate JobReadyPods that as far as I find, was introduced k8s v1.23. It adds Ready info to JobStatus.
As far as I understand Ready should always be =< Active, as Active still counts scheduled but not yet Succeeded/Failed pods and Ready just gives extra info on which of them are actually running now.
Furthermore, it seems that v1.7 should work with k8s clusters < v1.23.
@fdutton , @anleib, @DARB-CCM-S-20, @stephenpope if you could possibly share on which k8s version did you experience your problems? So that we can be sure that my conclusions here are correct.
@groundnuty Great work! 1.24 for me. I've internalized v1.7 for now and changed to v1.21 which is working fine.
I am on 1.24 K8s as well
I am on 1.24 K8s as well
Running v1.24.14 and ended up having to use v1.5.1- newer versions just completed immediately