enhancements Sidecar Containers

Enhancement Description

One-line enhancement description (can be used as a release note): Kubernetes introduces the built-in support for sidecar containers pattern.
Kubernetes Enhancement Proposal: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/753-sidecar-containers/README.md
Discussion Link: Sidecar working group and KEP draft
Primary contact (assignee): @SergeyKanzhelev
Responsible SIGs: node
Enhancement target (which target equals to which milestone):
- Alpha release target (x.y): 1.28
- Beta release target (x.y): 1.29
- Stable release target (x.y): 1.33
[x] Alpha
- [X] KEP (k/enhancements) update PR(s):
  - https://github.com/kubernetes/enhancements/pull/3761
  - https://github.com/kubernetes/enhancements/pull/3968
- [x] Code (k/k) update PR(s):
  - https://github.com/kubernetes/kubernetes/pull/116429
- [x] Docs (k/website) update PR(s):
  - https://github.com/kubernetes/website/pull/41802
[x] Beta
- [x] KEP (k/enhancements) update PR(s): https://github.com/kubernetes/enhancements/pull/4255
- [x] Code (k/k) update PR(s):
  - [x] https://github.com/kubernetes/kubernetes/pull/120620
  - [x] https://github.com/kubernetes/kubernetes/pull/121579
  - [x] https://github.com/kubernetes/kubernetes/pull/121307
- [x] Docs (k/website) update(s): https://github.com/kubernetes/website/pull/43471

/sig node

Please keep this description up to date. This will help the Enhancement Team to track the evolution of the enhancement efficiently.

Jan 29 '19 18:01 Joseph-Irving

@enisoc @dchen1107 @fejta @thockin @kow3ns @derekwaynecarr, opened this tracking issue so that we can discuss.

Jan 29 '19 18:01 Joseph-Irving

/assign

Jan 31 '19 21:01 kow3ns

@derekwaynecarr I've done some scoping out of the kubelet changes required for next week's sig-node meeting, I believe that changes are only needed in the kuberuntime package, specifically kuberuntime_manager.go in and kuberuntime_container.go.

In kuberuntime_manager.go you could modify computePodActions to implement the shutdown triggering (kill sidecars when all non-sidecars have permanently exited), and starting up the sidecars first.

In kuberuntime_container.go you could modify killContainersWithSyncResult for terminating the sidecars last and sending the preStop hooks (the preStop hooks bit was a bit debatable, it wasn't settled whether that should be done or not. @thockin had a good point about why you might not want to encourage that behaviour, see comment).

Let me know if you want me to investigate any further.

Feb 01 '19 14:02 Joseph-Irving

@kow3ns The discussion makes more sense to me if maybe we can define a full description of containers sequence in Pod spec (sig-app), and how to handle the sequence in kubelet for start, restart and cascading consideration (sig-node). Let's catch the Feb 5 sig-node meeting to give more inputs.

cc @Joseph-Irving

Feb 01 '19 19:02 resouer

The proposal says that sidecars only run after the init containers run. But what if the use-case requires the sidecar to run while/before the init containers run. For example, if you'd like route the pod's traffic through a proxy running as a sidecar (as in Istio), you probably want that proxy to be in place while the init containers run in case the init container itself does network calls.

Feb 07 '19 09:02 luksa

@luksa I think there's the possibility of looking at having sidecars that run in init phase at some point but currently the proposal is not going to cover that use case. There is currently no way to have concurrent containers running in the init phase so that would be potentially a much larger/messier change than what is being suggested here.

Feb 14 '19 12:02 Joseph-Irving

Update on this KEP: I've spoken to both @derekwaynecarr and @dchen1107 from sig-node about this and they did not express any major concerns about the proposal. I will raise a PR to the KEP adding some initial notes around implementation details and clarifying a few points that came up during the discussion.

We still need to agree on the API, it seems there is consensus that a simple way of marking containers as sidecars is prefered over more in depth ordering flags. Having a bool is somewhat limiting though so perhaps something more along the lines of containerLifecycle: Sidecar would be preferable so that we have the option of expanding in the future.

Feb 14 '19 12:02 Joseph-Irving

@Joseph-Irving Actually, neither the boolean nor the containerLifecycle: Sidecar are appropriate for proper future extensibility. Instead, containerLifecycle should be an object, just like deployment.spec.strategy, with type: Sidecar. This would allow us to then introduce additional fields. For the "sidecar for the whole lifetime of the pod" solution, it would be expressed along these lines:

containerLifecycle: 
  type: Sidecar
  sidecar:
    scope: CompletePodLifetime

as opposed to

containerLifecycle: 
  type: Sidecar
  sidecar:
    scope: AfterInit

Please forgive my bad naming - I hope the names convey the idea.

But there is one problem with the approach where we introduce containerLifecycle to pod.spec.containers. Namely, it's wrong to have sidecars that run parallel to init containers specified under pod.spec.containers. So if you really want to be able to extend this to init containers eventually, you should find an alternative solution - one that would allow you to mark containers as sidecars at a higher level - i.e. not under pod.spec.containers or pod.spec.initContainers, but something like pod.spec.sidecarContainers, which I believe you already discussed, but dismissed. The init containers problem definitely calls for a solution along these lines.

Feb 14 '19 15:02 luksa

@luksa You could also solve the init problem by just allowing an init container to be marked as a sidecar and have that run alongside the init containers. As I understand it, the problem is that init containers sometimes need sidecars, which is different from needing a container that runs for the entire lifetime of the pod.

The problem with pod.spec.sidecarContainers is that it's a far more complex change, tooling would need to updated and the kubelet would require a lot of modifying to support another set of containers. The current proposal is far more modest, it's only building on what's already there.

Feb 14 '19 15:02 Joseph-Irving

@Joseph-Irving We could work with that yes. It's not ideal for the sidecar to shut down after the init containers run and then have the same sidecar start up again, but it's better than not having that option. The bigger problem is that older Kubelets wouldn't handle init-sidecar containers properly (as is the case with main-sidecar containers).

I'd just like you to keep init-sidecars in mind when finalizing the proposal. In essence, you're introducing the concept of "sidecar" into k8s (previously, we basically only had a set of containers that were all equal). Now you're introducing actual sidecars, so IMHO, you really should think this out thoroughly and not dismiss a very important sidecar use-case.

I'd be happy to help with implementing this. Without it, Istio can't provide its features to init containers (actually, in a properly secured Kubernetes cluster running Istio, init containers completely lose the ability to talk to any service).

Feb 14 '19 16:02 luksa

In relation to the implementation discussion in https://github.com/kubernetes/enhancements/pull/841, I've opened a WIP PR containing a basic PoC for this proposal https://github.com/kubernetes/kubernetes/pull/75099. It's just a first draft and obviously not perfect but the basic functionality works and gives you an idea of the amount of change required.

cc @enisoc

Mar 07 '19 11:03 Joseph-Irving

I put together a short video just showing how the PoC currently behaves https://youtu.be/4hC8t6_8bTs. Seeing it in action can be better than reading about it. Disclaimer: I'm not a pro youtuber.

Mar 18 '19 15:03 Joseph-Irving

I've opened two new PRs:

https://github.com/kubernetes/enhancements/pull/918 is about Graduation critieria, version skew, upgrade/downgrade
https://github.com/kubernetes/enhancements/pull/919 is to decide on the API implementation

Any thoughts or suggestions will be much appreciated.

Mar 26 '19 14:03 Joseph-Irving

@Joseph-Irving Sorry if I'm commenting late in the design process, but I have a potential use case for sidecar containers which may not be supported in the current design proposal. I just wanted to raise it for consideration. The gist is that I have a scenario where on pod termination, 1 sidecar should be terminated before the main container, while another sidecar should be terminated after the main container.

A concrete example might be a pod with a Django app container, a consul sidecar for service registration, and a pgbouncer sidecar for managing connections to the database. When the pod is terminated, I'd like the consul sidecar to be stopped first (so no more traffic is routed to the pod), then the app container (ideally after a short grace period), and then the pgbouncer sidecar. The current proposal looks great for handling the app <-> pgbouncer container dependency, but doesn't seem expressive enough to capture the case where I'd like to tear down a sidecar before the primary container.

Mar 28 '19 23:03 currankaushik

@currankaushik, in the scenario you described you could potentially use a pre-stop hook to tell the consul container to prepare for shutdown and stop routing requests to you (assuming it can support something like that). pre stop hooks will be sent to sidecars first before the termination of containers begins.

The motivation for this was so that proxy sidecars like istio could enter a state where they're not routing traffic to you but are still allowing traffic out while your application finishes up and shuts down.

Mar 28 '19 23:03 Joseph-Irving

Sounds good, thanks @Joseph-Irving. So just to confirm my understanding at a high level: pre-stop hooks will be sent to sidecars first, followed by pre-stop hooks to the non-sidecars, SIGTERM to non-sidecars, and then (after all non-sidecars have exited) SIGTERM to sidecars? The design proposal (https://github.com/kubernetes/enhancements/blob/master/keps/sig-apps/sidecarcontainers.md) seems to imply this but also says:

PreStop Hooks will be sent to sidecars and containers at the same time.

Mar 28 '19 23:03 currankaushik

@currankaushik yeah what you described is the intended behaviour.

That line you quoted needs rewording. I had some misconceptions about how the prestop hooks were sent to the containers when I wrote that. Thanks for pointing it out.

Mar 28 '19 23:03 Joseph-Irving

@Joseph-Irving is this feature targeting alpha inclusion for 1.15?

Apr 11 '19 22:04 kacole2

@kacole2 yeah that is the plan, assuming we can get the KEP to implementable in time for enhancement freeze (april 30th). Once the api has been finalised https://github.com/kubernetes/enhancements/pull/919 and the test plan agreed https://github.com/kubernetes/enhancements/pull/951 I think we should be all set.

Apr 12 '19 07:04 Joseph-Irving

/milestone v1.15 /stage alpha

Apr 12 '19 18:04 kacole2

@Joseph-Irving Kubernetes 1.15 Enhancement Freeze is 4/30/2019. To be included in the Kubernetes 1.15 milestone, KEPs are required to be in an "Implementable" state with proper test plans and graduation criteria. Please submit any PRs needed to make this KEP adhere to inclusion criteria. If this will slip from the 1.15 milestone, please let us know so we can make appropriate tracking changes.

Apr 29 '19 18:04 mrbobbytables

@mrbobbytables unfortunately the PRs opened to get this to an implementable state have not had much movement on them so I think we will need to delay this until 1.16.

Apr 29 '19 18:04 Joseph-Irving

No worries. Thanks for being so quick to respond and letting us know! /milestone clear

Apr 29 '19 19:04 mrbobbytables

Please keep in mind, this KEP is very important for Istio !

It's a show stopper for all projects using service frameworks with coordinated bootstrap/shutdown (akka cluster, lagom etc.) together with istio service mesh see.

cc @jroper

Apr 30 '19 07:04 thomschke

@Joseph-Irving sry about the late comment, but I don't see the following in the design doc, and I was wondering what is the intended behavior of these:

if we see sidecar failure, do we always restart them if main container is not finished (disregarding restartPolicy in pod)? This would be useful as sidecar used to work as proxy, load balancing, house keeping role, and it doesn't matter if it fails couple of times as long as main container can continue to work

Also, when computing pod phase, if all main container succeeded, and sidecar failed (which is very common as if sidecar does not catch SIGTERM the return code will be like 143 or something), is the pod phase still "Succeeded"?

Apr 30 '19 17:04 zhan849

@zhan849 currently sidecar containers obey pod restart policy and are counted when computing pod phase such as Succeeded.

We did debate this quite a bit earlier in the process but the general feeling was that we should diverge from a normal container as little as possible, only doing so if it enables the described use cases.

In regards to pod phase: I would argue that all applications running in kubernetes should be handling SIGTERMs (especially sidecars), but also sometimes you would want to know if your sidecars exited in a bad way and that should be reflected in the pod phase, hiding that info could lead to confusion.

For restart policy, it only seems like that would be an issue if restart policy is never and your sidecar is prone to crashing. I'm not sure if the complication of diverging them from pod restart policy is worth it, especially as some people may want their sidecars to obey pod restart policy.

Both of these things are just in line with what a normal container does and what currently happens. Changing them didn't seem to be required to achieve the goals listed in the Kep.

If you have some widespread use cases for why changing them is needed to achieve those goals, that would be useful. As it makes it easier to justify a more complicated change to the code base.

Apr 30 '19 18:04 Joseph-Irving

@Joseph-Irving we have some simpler side car impls that has been running internally for some immediate needs (we did not contribute as this is already in progress in the community), and here are what we learned.

Regarding pod phase:

Container exist status is already reflected in pod.status.containerStatuses so we don't lose the information. Also, since a big use case of sidecar is in Job (or what ever run-to-finish pods such as those in Kubeflow), meaningful workloads will be applied to only main container and if pod phase is marked as Failed due to sidecar failure, there will result in unnecessary retries and lead to other misleading consequences such as Job fail, etc.
Although it is ideal for sidecars to handle SIGTERMs, in production, there could be plenty of sidecars that is simply built upon opensource software and they are not handling SIGTERMs nicely, including kube-proxy, postfix, rsyslogd, and many others (and even if SIGTERM is handled, for non-catchable SIGKILL, it will for sure not be 0)

Regarding restart policy (it could be arguable but have sidecars strictly obey restartPolicy is kind of not realistic in production):

Forcing sidecar to restart when main containers are still running by setting "OnFailure" is not an option as this will restart failed main containers and is confusing along with Job level retry limit.
Usually when handling sidecars, main containers usually have plenty of retry logics for sidecar unavailable, and these are done before the community has side car support with explicit container start order. Such historical error handlings are not very easy to change given the scope of it. Not restarting sidecar will cause main containers to hang and retry
Propagating failures to upper level controllers will trigger chains of reconciliation and a lot of api calls so unnecessary escalation of errors can make kubernetes less scalable. A more specific example: if a job's main containers are still running and sidecar fails, restarting sidecar will have just 1 PATCH pod status operation and at most 1 event related api call. But if failing the pod entirely will result in reconciliation of Job, and more hire level controllers such as CronJob and other CRDs and there could be many more times API call.

wanna also see if other people has seen similar issues (/cc @kow3ns )

Apr 30 '19 20:04 zhan849

Would this change incorporate the behavior desired in https://github.com/kubernetes/community/pull/2342, such that there'd be a way to configure the entire pod (or just the non-sidecar container) to restart if a sidecar fails?

May 01 '19 19:05 JacobHenner

@JacobHenner there's currently no plans to implement that kind of mechanism in this KEP, we did discuss incorporating it, but it doesn't really have much dependency on this KEP and could be developed independently of this. So it seems better suited to having its own KEP.

May 07 '19 07:05 Joseph-Irving

@Joseph-Irving just to share our impl that addressed the above mentioned pitfalls for your reference (https://github.com/zhan849/kubernetes/commits/kubelet-sidecar) since we our goal is to wait for official support, we try to keep change as local as possible in this commit.

so for a job restart policy == Never, with 1 main container, 1 bad sidecar that constantly crashes, 1 good sidecar that keeps running, pod status will look like this after main container quits with the above impl.

containerStatuses:
  - containerID: xxxxx
    image: xxxxx
    imageID: xxxxx
    lastState: {}
    name: main
    ready: false
    restartCount: 0
    state:
      terminated:
        containerID: xxxxx
        exitCode: 0
        finishedAt: "2019-05-24T17:59:53Z"
        reason: Completed
        startedAt: "2019-05-24T17:59:43Z"
  - containerID: xxxxx
    image: xxxxxx
    imageID: xxxxx
    lastState: {}
    name: sidecar-bad
    ready: false
    restartCount: 1
    state:
      terminated:
        containerID: xxxxx
        exitCode: 1
        finishedAt: "2019-05-24T17:59:46Z"
        reason: Error
        startedAt: "2019-05-24T17:59:45Z"
  - containerID: xxxxx
    image: xxxxxxx
    imageID: xxxxx
    lastState: {}
    name: sidecar-healthy
    ready: false
    restartCount: 0
    state:
      terminated:
        containerID: xxxxx
        exitCode: 137
        finishedAt: "2019-05-24T18:00:24Z"
        reason: Error
        startedAt: "2019-05-24T17:59:44Z"
  hostIP: 10.3.23.230
  phase: Succeeded
  podIP: 192.168.1.85
  qosClass: BestEffort
  startTime: "2019-05-24T17:59:41Z"

May 07 '19 17:05 zhan849

enhancements enhancements copied to clipboard

Sidecar Containers

Enhancement Description

enhancements
enhancements copied to clipboard