luigi icon indicating copy to clipboard operation
luigi copied to clipboard

KubernetesJobTask fails because waiting state reason is 'PodInitializing'

Open arturb90 opened this issue 3 years ago • 0 comments

I've been trying to use the Kubernetes Job wrapper, but I am facing a task failure, even though the Job executes just fine after it has been spun up.

Runtime error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/luigi/worker.py", line 193, in run
    new_deps = self._run_get_new_deps()
  File "/usr/local/lib/python3.10/site-packages/luigi/worker.py", line 133, in _run_get_new_deps
    task_gen = self.task.run()
  File "/usr/local/lib/python3.10/site-packages/luigi/contrib/kubernetes.py", line 391, in run
    self.__track_job()
  File "/usr/local/lib/python3.10/site-packages/luigi/contrib/kubernetes.py", line 224, in __track_job
    while not self.__verify_job_has_started():
  File "/usr/local/lib/python3.10/site-packages/luigi/contrib/kubernetes.py", line 304, in __verify_job_has_started
    assert wr == 'ContainerCreating', "Pod %s %s. Logs: `kubectl logs pod/%s`" % (
AssertionError: Pod fetch-20220720152731-be35e0c148e747aa-wjprm PodInitializing. Logs: `kubectl logs pod/fetch-20220720152731-be35e0c148e747aa-wjprm`

It looks like PodInitializing is one of the reasons for a pod to be in waiting state, although I could not find any documentation stating that it is (what i found is this: https://github.com/kubernetes/kube-state-metrics/blob/4090e8b7aa39afcfe4d5e62d3f3c7262e09409b9/docs/pod-metrics.md). The only state checked that is not a failure state is ContainerCreating though:

https://github.com/spotify/luigi/blob/afa6ba30b1acd45eba2a273f20a0e81f6e8da48b/luigi/contrib/kubernetes.py#L304

Seems like an easy fix, I'd be happy to do it.

arturb90 avatar Jul 21 '22 08:07 arturb90