kubernetes icon indicating copy to clipboard operation
kubernetes copied to clipboard

Idea: Pod-level probes or exclude some containers from pod readiness

Open thockin opened this issue 1 year ago • 19 comments

I swear we have discussed this before but I cannot find it.

Today, probes are per-container. This makes sense in a lot of ways - if the specific container is failing liveness, you usually want to restart that specific container. Also, we know that a large majority of pods run with a single container, so this has rarely been a major issue.

That said, I think there are cases where it's imperfect. This came up in a user issue a few weeks ago and I meant to ping the old issue (which I cannot find), so I am opening this to discuss.

The specific case in question was a pod with multiple containers - one main app container and a small number of background-helper containers (think logs/metrics). The user had configured readiness probe for the "main" app and it was stable, but one of the background helpers was crashy. It had triggered crashloop-backoff and was therefore not-ready. This makes the whole pod not-ready. This was surprising to the user. When I looked at it from their POV, I kind of agreed. Why isn't there a way to express "It's OK for the helper to crash, as long as my main app is still serving"?

So the idea here is pod-level probes. At least ReadinessProbe makes sense, and I think one could argue about Liveness. Startup is somewhere between. Obviously (to me, anyway?) exec probes present a problem, but network probes seem OK.

An alternate approach could be a way toi express "this container should not be considered for pod readiness" or something.

I open this for discussion.

/cc @tallclair @SergeyKanzhelev @tssurya @danwinship @aojea @lauralorenz

thockin avatar Sep 10 '24 19:09 thockin

  • Older issue: https://github.com/kubernetes/kubernetes/issues/95193
  • Recent Slack question: https://kubernetes.slack.com/archives/C06JCTLFPFX/p1718971935169169

We definitively discussed it for sidecars and decided that we want readiness to be combined from sidecars and regular containers. This makes sense for sidecars like istio:

https://github.com/kubernetes/enhancements/blob/dbb9ce341d2098c70df1f42ea115a316b4e63a21/keps/sig-node/753-sidecar-containers/README.md?plain=1#L416

However we discussed this as a feature we can enable via some sort of configuration: https://docs.google.com/document/d/1E1guvFJ5KBQIGcjCrQqFywU9_cBQHRtHvjuqcVbCXvU/edit#bookmark=id.5tnohbsioarq

SergeyKanzhelev avatar Sep 10 '24 19:09 SergeyKanzhelev

/kind feature /triage accepted

There may be different ways to design it. But there might be interesting assumptions and behaviors related to it, so needs to be designed carefully. We discussed some complications at the Sidecar meeting I referred above

SergeyKanzhelev avatar Sep 10 '24 19:09 SergeyKanzhelev

So, If we have

Pod
- c1
   probe1
-c2
   probe2
- c3
  probe3

now we have probeResult = probe1 && probe2 && probe 3

and user may want to say?

probeResult = probe1 || (probe2 && probe3) EDIT: probeResult = probe1 && probe2 (not probe3)

We have something that can reuse with the PodReadinessGate functionality, WDYT about

apiVersion: v1
kind: Pod
metadata:
  name: multiprobe
spec:
   readinessGates:
    - containerName: "c1"
    - containerName: "c2"
  containers:
  - name: c1
    readinessProbe:
  - name: c2
    readinessProbe:
  - name: c2
    readinessProbe:

aojea avatar Sep 10 '24 21:09 aojea

my reading of it is that semantic desired is mostly "ignore one of the container's readiness", not the ||

SergeyKanzhelev avatar Sep 10 '24 21:09 SergeyKanzhelev

Thanks for the links Sergey!

@aojea, readiness gates are "in addition to" the "container is running". While we could make it arbitrarilty complicated (e.g. a CEL expression which gets a map[string]bool representing each container's readiness), we can also keep it dead simple. Either pod probe is mutually exclusive with container probes or add a field to each container which indicates whether it counts for pod readiness. I am sure other options can be found :)

thockin avatar Sep 10 '24 21:09 thockin

One simple solution that works today is to point sidecar's probe to the main container instead of the sidecar. This way the readiness state will be replicated. The only problem is that we mark container as not ready WHILE restarting.

So introducing a new probe type that simply returns true even when the container is down may be an alternative.

SergeyKanzhelev avatar Sep 10 '24 22:09 SergeyKanzhelev

introducing a new probe type that simply returns true even when the container is down may be an alternative.

That would represent a change in the probe logic, wouldn't it? Today all probes are only considered if the container is up, I thought?

thockin avatar Sep 10 '24 22:09 thockin

introducing a new probe type that simply returns true even when the container is down may be an alternative.

That would represent a change in the probe logic, wouldn't it? Today all probes are only considered if the container is up, I thought?

Yes. I mostly asking about desired semantics though. If probe type true is a desired API, it may be reasonably straightforward to implement.

SergeyKanzhelev avatar Sep 10 '24 22:09 SergeyKanzhelev

That would satisfy, but IMO is more obtuse than something like participateInPodReadiness: false

thockin avatar Sep 10 '24 23:09 thockin

That would satisfy, but IMO is more obtuse than something like participateInPodReadiness: false

Do you think that container status can be Not Ready and Pod status is ready? Or this flag will override any configured readiness Probe? Or it will not allow to configure a readiness probe? I suggested the new probe type to eliminate these types of questions

SergeyKanzhelev avatar Sep 10 '24 23:09 SergeyKanzhelev

Fair questions. I didn't mean to turn this issue into the kep, but off the top of my head he feels like this flag is very explicit. All it means is that this container's readiness does not contribute to the pods readiness. Or something like that. I think the pods phone probes don't need to change semantics at all. The individual container's readiness isn't super useful or interesting, but I don't see why we need to prevent it.

thockin avatar Sep 11 '24 00:09 thockin

I'm not sure if this is possible in a backwards-compatible way with sidecars already in beta, but I like this UX: sidecars are only included in pod readiness if they have a readiness probe. I suppose we might want a noop readiness probe type if there are cases where the user only cares that the container is running.

tallclair avatar Sep 19 '24 20:09 tallclair

I suppose we might want a noop readiness probe type if there are cases where the user only cares that the container is running.

I think the core new concept here is not "always true" probe, but the ready status that ignores the container being alive. And it may be applied to both - regular containers and sidecars. I don't think it is a feature request that is unique to sidecars.

possible in a backwards-compatible way with sidecars already in beta,

The reason I think backward compatibility may be affected is that the new field may not be read by every existing consumer. Not sure how much we consider the current statement as an guaranteed API:

A Pod is considered ready when all of its containers are ready.

Changing this seems to be a breaking change, even if the new field is introduced to ignore it. Unless this new field simply overrides the Ready value by setting it to true unconditionally.

SergeyKanzhelev avatar Sep 19 '24 21:09 SergeyKanzhelev

Since the sidecar readiness probe cannot affect the pod ready status, simply remove the sidecar readiness probe and only keep the main container readiness probe.

songminglong avatar Oct 24 '24 08:10 songminglong

Since the sidecar readiness probe cannot affect the pod ready status, simply remove the sidecar readiness probe and only keep the main container readiness probe.

The issue is that the Pod will be marked Not Ready WHILE sidecar is being restarted. I think the ask here is to IGNORE sidecar ready state even WHILE it is restarting

SergeyKanzhelev avatar Oct 24 '24 18:10 SergeyKanzhelev

Exactly. The sidecar's upness or downness has no real bearing on whether this Hod is viable as a back end for the service

thockin avatar Oct 26 '24 00:10 thockin

Any updates here?

I just tried the sidecar containers and this is so so disappointing

Can't understand how and where it can be useful. I tried to move the non critical containers to the sidecar containers in order to prevent the situation when the container is failed and makes the entire pod as non ready which stops the traffic to be forwarded to the pod.

But unfortunately even if I remove completely readiness and liveness probes completely and the sidecar is crash or being restarted, the pod is not ready and...removed from the endpoint. Does not make sense that one non-critical container can cause the outage in the main container.

alexku7 avatar Jun 18 '25 15:06 alexku7

The documentation is also very confusing The doc says that "If a readinessProbe is specified for this init container, its result will be used to determine the ready state of the Pod."

Which can be understood mistakenly that if I don't specify readinessProbe so the sidecar container doesn't affect the readiness status of the main container and the pod but this is not true! Failed sidecar container does affect the pod and marks it us not ready and this is bad.

alexku7 avatar Jun 18 '25 15:06 alexku7

Also looking forward to this. There doesn't seem to be any alternative way to ignore a sidecar's error/crash when determining a Pod's readiness. An opt-out method like the participateInPodReadiness: false mentioned above would be a safe bet.

shervinkh avatar Jun 18 '25 21:06 shervinkh