krane icon indicating copy to clipboard operation
krane copied to clipboard

Timeout if pods are stuck in ContainerCreating

Open viniciusgama opened this issue 5 years ago • 3 comments

In my current project we have some pods that runs as tasks and they can take up to 2h to run, for that we had to set the timeout to be 2h. Sometimes pods get stuck in ContainerCreating in which case we only gonna see that after this timeout period. The question is, there is a way to make kubernetes-deploy to timeout if pods are in this state for more than a few minutes? This would produce faster feedback which I believe is desirable.

viniciusgama avatar Oct 31 '19 12:10 viniciusgama

Hi @viniciusgama can you tell us a bit more why the tasks need to be deployed as pods and not jobs? We've got logic in the job class that sounds like it handles what you'd want https://github.com/Shopify/kubernetes-deploy/blob/master/lib/krane/kubernetes_resource/job.rb#L7

dturn avatar Oct 31 '19 16:10 dturn

Thanks for your quick reply @dturn.

I said jobs but I actually meant to say tasks. In my team we are doing as suggested here. In our case we run database migrations, copy assets before we can rollout the application itself but the issue is not restricted to these tasks, it can be any sort of pod really.

Don't think the link you posted will be of any help for me right now. Do you recall any other way this can be achieved? Or we would have to implement something?

viniciusgama avatar Oct 31 '19 19:10 viniciusgama

Internally, we run log lived db migrations out of band using a job. The template resource looks something like:

apiVersion: batch/v1
kind: Job
metadata:
  name: long-db-migrate
spec:
  backoffLimit: 3
  activeDeadlineSeconds: 172800 # Allow running for 48 hours
  template:
    metadata:
      name: long-db-migrate
    spec:
      restartPolicy: Never

Would this approach work for you?

dturn avatar Nov 04 '19 16:11 dturn