enhancements
enhancements copied to clipboard
Re-run initContainers in a Deployment when containers exit on error
I'm copying this issue from https://github.com/kubernetes/kubernetes/issues/52345 because it seems that this is the appropriate repo for it.
Is this a BUG REPORT or FEATURE REQUEST?:
/kind feature
What happened: Container in a Deployment exits on error, container is restarted without first re-running the initContainer.
What you expected to happen: Container in a Deployment exits on error, initContainer is re-run before restarting the container.
How to reproduce it (as minimally and precisely as possible):
Sample spec:
kind: "Deployment"
apiVersion: "extensions/v1beta1"
metadata:
name: "test"
labels:
name: "test"
spec:
replicas: 1
selector:
matchLabels:
name: "test"
template:
metadata:
name: "test"
labels:
name: "test"
spec:
initContainers:
- name: sleep
image: debian:stretch
imagePullPolicy: IfNotPresent
command:
- sleep
- 1s
containers:
- name: test
image: debian:stretch
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- exit 1
Implementation Context:
I have an initContainer that waits for a service running in Kubernetes to detect its existence via pod annotations, and send it an HTTP request, upon which it writes this value to disk. The main container then reads this value upon startup and "unwraps" it via another service, upon which it stores the unwrapped value in memory.
The value that is written to disk by the initContainer is a one-time read value, in that once it is used the value is then expired. The problem is that if the main container ever restarts due to fatal error, it loses that unwrapped value and upon startup tries to unwrap the expired value again, leading to an infinite crashing loop until I manually delete the pod, upon which a new pod is created, the initContainer runs, and all is again well.
I desire a feature that restarts the entire pod upon container error so that this workflow can function properly.
Enhancement Description
- One-line enhancement description (can be used as a release note):
- Kubernetes Enhancement Proposal:
- Discussion Link:
- Primary contact (assignee):
- Responsible SIGs:
- Enhancement target (which target equals to which milestone):
- Alpha release target (x.y):
- Beta release target (x.y):
- Stable release target (x.y):
- [ ] Alpha
- [ ] KEP (
k/enhancements) update PR(s): - [ ] Code (
k/k) update PR(s): - [ ] Docs (
k/website) update PR(s):
- [ ] KEP (
Please keep this description up to date. This will help the Enhancement Team to track the evolution of the enhancement efficiently.
/sig node
This is a challenging use-case. How do you trigger this if your app has 2 containers? What if one of them is a sidecar that you (the pod author) don't really know about or control?
It seems to me that initContainer (as defined today) is a poor fit here - your app startup could either do this itself or you can wrap it in another tool/script that does the unwrap and then starts your app. That answer is, itself, somewhat unsatisfying because it means you can't decouple those ideas or those container images or credentials/permissions.
@SergeyKanzhelev since "keystone" came up in the sidecar discussion too - this is what I really meant when we started the idea. It doesn't mean "this is an app" vs "this is a sidecar" - it means "if this one goes down, everything goes down" Most pods would not use this feature at all, but those who need it KNOW they need it.
@jpbetz since you're looking at the lifecycle stuff, too.
@thockin What you term “keystone” containers, I've heard named “essential” (eg in https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html#container_definitions)
EVERYONE should know by now NOT to let me name things :)
I desire a feature that restarts the entire pod upon container error so that this workflow can function properly.
This is the direction I started thinking when I saw this issue. I agree with @thockin that the initContainers are a poor fit. initContainers are containers that initialize the pod and they do exactly that.
Say it was possible to define a Deployment with a restartPolicy=Never pod (today it can only be Always). That would make the desired pod lifecycle clear for this "initContainer initializes a one-time read value" case-- if the main container fails terminate the pod and create a new one to replace it. But would have the major downside of requiring a new pod be scheduled each time the main container failed. That's probably not what most people would want?
One alternative would be a sidecar that can produce a "one-time read value". Each time the main container starts, it retrieves a new "one-time read value" from the sidecar. It would then be possible to have a simple process in the main container that retrieves the "one-time read value", writes it to the appropriate location on disk and then starts the main process for the container.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
@Ugzuzg do you plan to work on this for 1.28? I see you removed the stale lifecycle.
Wondering if this can make it into 1.29?
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale