kubernetes Re-run initContainers in a Deployment when containers exit on error

Is this a BUG REPORT or FEATURE REQUEST?:

/kind feature

What happened: Container in a Deployment exits on error, container is restarted without first re-running the initContainer.

What you expected to happen: Container in a Deployment exits on error, initContainer is re-run before restarting the container.

How to reproduce it (as minimally and precisely as possible):

Sample spec:

kind: "Deployment"
apiVersion: "extensions/v1beta1"
metadata:
  name: "test"
  labels:
    name: "test"
spec:
  replicas: 1
  selector:
    matchLabels:
      name: "test"
  template:
    metadata:
      name: "test"
      labels:
        name: "test"
    spec:
      initContainers:
        - name: sleep
          image: debian:stretch
          imagePullPolicy: IfNotPresent
          command:
            - sleep
            - 1s
      containers:
        - name: test
          image: debian:stretch
          imagePullPolicy: IfNotPresent
          command:
            - /bin/sh
            - exit 1

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"", Minor:"", GitVersion:"v0.0.0-master+$Format:%h$", GitCommit:"db809c0eb7d33fac8f54d8735211f2f3a8fc4214", GitTreeState:"clean", BuildDate:"2017-09-11T19:46:47Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"", Minor:"", GitVersion:"v0.0.0-master+$Format:%h$", GitCommit:"db809c0eb7d33fac8f54d8735211f2f3a8fc4214", GitTreeState:"clean", BuildDate:"2017-09-11T19:46:47Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

OS (e.g. from /etc/os-release): Debian GNU/Linux 9 (stretch)
Kernel (e.g. uname -a): Linux aleinung 4.9.0-3-amd64 #1 SMP Debian 4.9.30-2+deb9u3 (2017-08-06) x86_64 GNU/Linux

Implementation Context:

I have an initContainer that waits for a service running in Kubernetes to detect its existence via pod annotations, and send it an HTTP request, upon which it writes this value to disk. The main container then reads this value upon startup and "unwraps" it via another service, upon which it stores the unwrapped value in memory.

The value that is written to disk by the initContainer is a one-time read value, in that once it is used the value is then expired. The problem is that if the main container ever restarts due to fatal error, it loses that unwrapped value and upon startup tries to unwrap the expired value again, leading to an infinite crashing loop until I manually delete the pod, upon which a new pod is created, the initContainer runs, and all is again well.

I desire a feature that restarts the entire pod upon container error so that this workflow can function properly.

Sep 12 '17 14:09 aisengard

/sig node

Sep 12 '17 14:09 aisengard

Good catch, When use init container to acquire certificate or token, containers may remove it after read to cache. Then containers may re-run repeatedly after once panic.

Sep 13 '17 02:09 hzxuzhonghu

@aisengard I think the use-case you are talking about can be simulated by sharing volume between container and init-container, isn't it?

I have updated the config to include volumeMount that is shared between container and init-container.

kind: "Deployment"
apiVersion: "extensions/v1beta1"
metadata:
  name: "test"
  labels:
    name: "test"
spec:
  replicas: 1
  selector:
    matchLabels:
      name: "test"
  template:
    metadata:
      name: "test"
      labels:
        name: "test"
    spec:
      initContainers:
        - name: sleep
          image: debian:stretch
          imagePullPolicy: IfNotPresent
          command:
            - sh
            - -c
            - 'echo "create by init-container" > /dir/file'
          volumeMounts:
            - mountPath: /dir
              name: shared
      containers:
        - name: test
          image: debian:stretch
          imagePullPolicy: IfNotPresent
          command:
            - sh
            - -c
            - "cat /dir/file && sleep 99999s"
          volumeMounts:
            - mountPath: /dir
              name: shared
      volumes:
        - name: shared
          emptyDir: {}

running it:

$ k create -f file.yml 
deployment "test" created
$ k get pods
NAME                    READY     STATUS    RESTARTS   AGE
test-3165636750-b497p   1/1       Running   0          4s
$ k logs test-3165636750-b497p
create by init-container

Sep 18 '17 05:09 surajssd

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta. /lifecycle stale

Jan 11 '18 17:01 fejta-bot

/cc @Random-Liu @yujuhong

Jan 12 '18 01:01 hzxuzhonghu

/remove-lifecycle stale

Jan 12 '18 05:01 chenk008

https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#pod-restart-reasons

All containers in a Pod are terminated while restartPolicy is set to Always, forcing a restart, and the Init Container completion record has been lost due to garbage collection.

When all containers are terminated, the pod didn't restart, and the Init Container didn't rerun.

Jan 12 '18 05:01 chenk008

How about introducing a new RestartPolicy value such as AlwaysPod which means always restart the pod whenever any container dies?

I.e. the whenever any (non-init) container of the pod dies then the remaining healthy containers should be terminated and the pod restarted (in the same node), starting with the init-containers.

This approach can cover one of the common/simple use-case for init-containers to wait for dependencies or some action required before any of the pod's containers (re)start.

Mar 05 '18 05:03 amshuman-kr

An AlwaysPod or perhaps RestartPod RestartPolicy would also be very useful for this use-case: I'm using initContainers to get a job from a work-queue and then sequentially run a series of containers to process this job. Once the job is finished I'd like to pod to restart and thus wait for the next job. The Job resource doesn't seem to support indefinite completions. I don't want the overhead of something like brigade or argo for what should be a pretty simple use-case.

Mar 30 '18 04:03 eug48

The AlwaysPod RestartPolicy can also be used to make a stateful apps/services more self-managed using init-containers and side-car containers for some of the simpler management tasks such as backups while using controllers/operators for more complicated operations.

Apr 03 '18 06:04 amshuman-kr

Is there any current workaround to restart pod when container restarts?

All containers in a Pod are terminated while restartPolicy is set to Always, forcing a restart, and the Init Container completion record has been lost due to garbage collection.

I read that in the docs and see it again here....has anyone confirmed a way of getting a state in which pods will be restarted when the container restarts, as it is stated in documentation?

Jun 28 '18 16:06 majgis

@majgis AFAIK, the only possible work-around is to bake in some co-ordination between the containers of the pod.

I am working on a PR for implementing the AlwaysPod restartPolicy which will address this problem of restarting pod on container failure. I am planning to raise the PR next week.

Jun 29 '18 04:06 amshuman-kr

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

Sep 27 '18 05:09 fejta-bot

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

Oct 27 '18 06:10 fejta-bot

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

Nov 26 '18 07:11 fejta-bot

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Nov 26 '18 07:11 k8s-ci-robot

@aisengard did you ever find a solution to this? We hit exactly the same issue today, we have an initContainer to read some secret data from vault and write it to an emptyDir volume which is shared between the initContainer and the first container in the pod. The first container reads this file when executing the command and then deletes it so no one can enter the pod and read the file; but if the container restarts the initContainer isn't run so the file doesn't exist

Oct 09 '19 19:10 REBELinBLUE

/reopen

Mar 12 '20 22:03 adamzr

@adamzr: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Mar 12 '20 22:03 k8s-ci-robot

Is there any update or a different issue opened for this?

Aug 26 '20 14:08 santoo1116

Any idea how to solve this issue? It shuts down my production backend system.

The only solution which I have now is to "manually" (via CRON task) delete pod if it reach CrashLoopBackOff state.

Dec 07 '20 07:12 petrknap

/reopen

Still an issue I think.

Dec 10 '20 14:12 jsravn

@jsravn: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Still an issue I think.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Dec 10 '20 14:12 k8s-ci-robot

Any idea how to solve this issue? It shuts down my production backend system.

The only solution which I have now is to "manually" (via CRON task) delete pod if it reach CrashLoopBackOff state.

@petrknap another idea, while also a hack, is to use one of the various k8s client API libraries to watch for this condition and delete the pod upon detection.

A few downsides to this, amongst I'm sure many others, is that if the new pod is scheduled on another node the image has to be re-pulled, you lose the 'backoff' logic and associated pausing, and any metrics & alerts related to crash looping likely need adjusting. In my case I'm handling the latter two concerns with another layer of hacks inside the "crash loop watcher" and would certainly much rather have native handling of this.

Jan 23 '21 23:01 kppullin

This would be very useful for us as well.

We have a daemonset which creates a gRPC socket on the node. This is created with root group ownership. Then we have a second daemonset which runs and accesses this socket to capture events logged by the first daemonset and export them to prometheus. Because this socket is owned by root, the "exporter" container requires root.

To work around the root requirement, we set up an init container in the exporter daemonset manifest to change the socket permissions on the host volume. When the container in the first daemonset is killed (OOM), the group ownership of the socket reverts back to root. The exporter container then crashloops.

At this point we expected that the exporter daemonset should re-run the initcontainer first to fix the socket permissions, but it is not, so the exporter continues to crashloop.

Restarting the exporter daemonset or deleting the exporter pods seems to work but it would be great if the whole pod would restart and run the init container again.

Oct 27 '21 23:10 tspearconquest

Good catch, When use init container to acquire certificate or token, containers may remove it after read to cache. Then containers may re-run repeatedly after once panic.

Exactly my use case. It would be really nice if there was a solution for this.

Dec 22 '21 17:12 renannprado

bump

Feb 25 '22 16:02 johnjelinek

bump

Apr 01 '22 14:04 zoranjurcevic

I ran into the similar issue, namely as follows.

we are running bitnami redis on our kubernetes cluster as statefullset and each pod has following components. helm chart : redis-16.8.10 app: 6.2.7

init container -> to issue the permissions for pv
redis container
sentinel container
metrics container

After long running time, some how redis container is failing with error as " Can't open the append-only file: Read-only file system" and the redis container is getting restarted continuously while other pods are running as expected.

The only way to get away from this situation is to kill the existing pod, thus warrants creation of new pod. But kubernetes doesn't kill the pod instead restarts the problematic container.

Can we have a feature something like executing init container everytime other container gets restarted/ being able kill the pod and wait for another pod spin?

May 12 '22 18:05 SatishNaidi

/reopen

this is such unexpected behaviour as a long time k8s user. the proposed solution to add another value to RestartPolicy which forces all containers to restart seems reasonable - any thoughts from sig-node?

Jul 21 '22 08:07 afirth