enhancements Re-run initContainers in a Deployment when containers exit on error

I'm copying this issue from https://github.com/kubernetes/kubernetes/issues/52345 because it seems that this is the appropriate repo for it.

Is this a BUG REPORT or FEATURE REQUEST?:

/kind feature

What happened: Container in a Deployment exits on error, container is restarted without first re-running the initContainer.

What you expected to happen: Container in a Deployment exits on error, initContainer is re-run before restarting the container.

How to reproduce it (as minimally and precisely as possible):

Sample spec:

kind: "Deployment"
apiVersion: "extensions/v1beta1"
metadata:
  name: "test"
  labels:
    name: "test"
spec:
  replicas: 1
  selector:
    matchLabels:
      name: "test"
  template:
    metadata:
      name: "test"
      labels:
        name: "test"
    spec:
      initContainers:
        - name: sleep
          image: debian:stretch
          imagePullPolicy: IfNotPresent
          command:
            - sleep
            - 1s
      containers:
        - name: test
          image: debian:stretch
          imagePullPolicy: IfNotPresent
          command:
            - /bin/sh
            - exit 1

Implementation Context:

I have an initContainer that waits for a service running in Kubernetes to detect its existence via pod annotations, and send it an HTTP request, upon which it writes this value to disk. The main container then reads this value upon startup and "unwraps" it via another service, upon which it stores the unwrapped value in memory.

The value that is written to disk by the initContainer is a one-time read value, in that once it is used the value is then expired. The problem is that if the main container ever restarts due to fatal error, it loses that unwrapped value and upon startup tries to unwrap the expired value again, leading to an infinite crashing loop until I manually delete the pod, upon which a new pod is created, the initContainer runs, and all is again well.

I desire a feature that restarts the entire pod upon container error so that this workflow can function properly.

Enhancement Description

One-line enhancement description (can be used as a release note):
Kubernetes Enhancement Proposal:
Discussion Link:
Primary contact (assignee):
Responsible SIGs:
Enhancement target (which target equals to which milestone):
- Alpha release target (x.y):
- Beta release target (x.y):
- Stable release target (x.y):
[ ] Alpha
- [ ] KEP (k/enhancements) update PR(s):
- [ ] Code (k/k) update PR(s):
- [ ] Docs (k/website) update PR(s):

Please keep this description up to date. This will help the Enhancement Team to track the evolution of the enhancement efficiently.

Dec 06 '22 14:12 szh

/sig node

Dec 06 '22 14:12 szh

This is a challenging use-case. How do you trigger this if your app has 2 containers? What if one of them is a sidecar that you (the pod author) don't really know about or control?

It seems to me that initContainer (as defined today) is a poor fit here - your app startup could either do this itself or you can wrap it in another tool/script that does the unwrap and then starts your app. That answer is, itself, somewhat unsatisfying because it means you can't decouple those ideas or those container images or credentials/permissions.

@SergeyKanzhelev since "keystone" came up in the sidecar discussion too - this is what I really meant when we started the idea. It doesn't mean "this is an app" vs "this is a sidecar" - it means "if this one goes down, everything goes down" Most pods would not use this feature at all, but those who need it KNOW they need it.

@jpbetz since you're looking at the lifecycle stuff, too.

Dec 13 '22 00:12 thockin

@thockin What you term “keystone” containers, I've heard named “essential” (eg in https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html#container_definitions)

Dec 22 '22 21:12 sftim

EVERYONE should know by now NOT to let me name things :)

Dec 22 '22 22:12 thockin

I desire a feature that restarts the entire pod upon container error so that this workflow can function properly.

This is the direction I started thinking when I saw this issue. I agree with @thockin that the initContainers are a poor fit. initContainers are containers that initialize the pod and they do exactly that.

Say it was possible to define a Deployment with a restartPolicy=Never pod (today it can only be Always). That would make the desired pod lifecycle clear for this "initContainer initializes a one-time read value" case-- if the main container fails terminate the pod and create a new one to replace it. But would have the major downside of requiring a new pod be scheduled each time the main container failed. That's probably not what most people would want?

One alternative would be a sidecar that can produce a "one-time read value". Each time the main container starts, it retrieves a new "one-time read value" from the sidecar. It would then be possible to have a simple process in the main container that retrieves the "one-time read value", writes it to the appropriate location on disk and then starts the main process for the container.

Jan 13 '23 21:01 jpbetz

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Apr 13 '23 21:04 k8s-triage-robot

/remove-lifecycle stale

Apr 14 '23 07:04 Ugzuzg

@Ugzuzg do you plan to work on this for 1.28? I see you removed the stale lifecycle.

May 05 '23 21:05 SergeyKanzhelev

Wondering if this can make it into 1.29?

Aug 15 '23 17:08 bzhang-liveperson

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 26 '24 11:01 k8s-triage-robot

/remove-lifecycle stale

Jan 26 '24 22:01 bzhang-liveperson

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Apr 25 '24 22:04 k8s-triage-robot

/remove-lifecycle stale

Apr 27 '24 12:04 objnf-dev

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jul 26 '24 12:07 k8s-triage-robot

enhancements enhancements copied to clipboard

Re-run initContainers in a Deployment when containers exit on error

Enhancement Description

enhancements
enhancements copied to clipboard