serving icon indicating copy to clipboard operation
serving copied to clipboard

Support `exec` readinessProbes for sidecar serving containers

Open flomedja opened this issue 1 year ago • 2 comments

Knative version v1.15.0

Expected Behavior

The Knative serving supports the exec readiness probes for serving sidecar containers.

I call serving sidecar containers the containers that didn't have the port defined.

Actual Behavior

The knative serving reconciler triggers an error because the exec readiness probe is not supported in the serving sidecar containers. It is due to the fact that the applyReadinessProbeDefaults is used to compute the probes for the main and sidecar serving containers. The function needs a port to create a TCPProbe in the queue container. That TCPProbe is required for serving containers exec readiness probes.

Question : Why do we need to have a TCP Probe on the queue container for the exec probes of the serving containers ?

Reflection : If ports definition were allowed for sidecar containers, this issue would be trivial to fix. I see that an issue is open for but is on ice for a while.

Contribution: I can help fixing the issue when we will decide on a common implementation.

Steps to Reproduce the Problem

  1. Create a knative serving with this yaml :
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  annotations:
    # Add necessary annotations
    serving.knative.dev/istio-gateways: knative-serving/knative-external-ingress-gateway # Change If you don't use istio gateway
  name: helloworld-tcp-error
  namespace: default # Customize your namespace
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/class: kpa.autoscaling.knative.dev
        autoscaling.knative.dev/metric: concurrency
        autoscaling.knative.dev/max-scale: "1"
        autoscaling.knative.dev/min-scale: "1"
    spec:
      containerConcurrency: 0
      containers:
        - image: ghcr.io/knative/helloworld-go:latest
          name: app
          env:
            - name: TARGET
              value: "Go Sample v1"
          ports:
            - containerPort: 8080
              protocol: TCP
          readinessProbe:
            exec:
              command:
                - /bin/sh
                - -c
                - cat /proc/1/status
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop:
                - "CAP_SYS_ADMIN"
            runAsNonRoot: true
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
        - name: sidecar
          image: registry.k8s.io/busybox
          args:
            - /bin/sh
            - -c
            - touch /tmp/healthy; sleep 30; rm -f /tmp/healthy; sleep 600
          readinessProbe:
            exec:
              command:
                - /bin/sh
                - -c
                - cat /proc/1/status
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop:
                - "CAP_SYS_ADMIN"
            runAsNonRoot: true
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
      timeoutSeconds: 60
  traffic:
    - latestRevision: true
      percent: 100
  1. retrieve the container events with kubectl events -n <namespace> you will see an error similar as :
5s (x21 over 38m)        Warning   InternalError           Revision/helloworld-tcp-error-00001                            failed to create deployment "helloworld-tcp-error-00001-deployment": failed to make deployment: failed to create PodSpec: failed to create queue-proxy container: sidecar readiness probe does not define probe port on container: sidecar

flomedja avatar Aug 26 '24 22:08 flomedja

@skonto Can I have your input on this please ? Thanks :)

flomedja avatar Sep 11 '24 16:09 flomedja

@ReToCode Can I have your input on this issue please ? Thanks :)

flomedja avatar Sep 26 '24 20:09 flomedja

@skonto @ReToCode hi, gentle ping :)

flomedja avatar Nov 07 '24 14:11 flomedja

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

github-actions[bot] avatar Feb 06 '25 01:02 github-actions[bot]

/remove-lifecycle stale

flomedja avatar Feb 06 '25 17:02 flomedja

Hi @FloMedja,

We need to probe early at the QP side to (before kubelet to speed up readiness) and by design (there are more details in the doc) we don't allow exec probes because we cant call exec from the queue proxy. There is a special treatment for that case.

cc @dprotaso

skonto avatar Feb 07 '25 14:02 skonto

Thanks @skonto . The documentation explains in more detail what I noticed in the code. At the moment , the exec probe on the main container is delegate to Kubernetes. In that case the QP rely on a TCP probe to the exposed port of the main container. When a exec probe is defined on a sidecar container it is just ignored.

Would it be a good idea to also delegate the exec probe to Kubernetes when defined on sidecar containers ? The QP won't have a TCP Probe to rely on in that case, but that won't be an issue since no traffic should be directly forwarded to the sidecar container.

flomedja avatar Feb 07 '25 16:02 flomedja

Would it be a good idea to also delegate the exec probe to Kubernetes when defined on sidecar containers ?

I think this is the path forward. If there is any exec probes in the Pod we should just defer to kubernetes to signal the health of that Pod and avoid any optimizations that we have.

We already do this with exec probes on the main container

/triage accepted

dprotaso avatar Feb 09 '25 23:02 dprotaso

@dprotaso @skonto Thanks for the answers. I will move forward with the implementation.

flomedja avatar Feb 10 '25 10:02 flomedja

@dprotaso @skonto I have a MR https://github.com/knative/serving/pull/15773. I will need a member ok knative organization to review and mark it as /ok-to-test. Thanks :)

flomedja avatar Feb 12 '25 16:02 flomedja