serving icon indicating copy to clipboard operation
serving copied to clipboard

Add startup probe support to Knative Service

Open ReToCode opened this issue 1 year ago • 4 comments

Fixes #10037

Context

see feature document: https://docs.google.com/document/d/1TmimPy7qNLtc5IHVFKEme8X-NiIUBtVgR44GlaTqoWs/edit?usp=sharing

Proposed Changes

  • Adds startup probe support to Knative Service
  • Startup Probes disable the Knative's probe optimisation, same as exec probes do, as they are executed on the user-container by the Kubelet
  • The ProgressDeadlineSeconds is dynamically increased to the maximum duration a startup probe could take (worst case) to make sure the pod is not scaled to zero before the startup probes succeeded or failed.

Release Note

Knative Service now supports setting startup probes in the spec. Please note that this increases the cold-start time of your service (more info in docs).

ReToCode avatar Jun 06 '24 12:06 ReToCode

/assign @skonto /assign @dprotaso /assign @izabelacg

More info in the links above.

ReToCode avatar Jun 06 '24 12:06 ReToCode

Codecov Report

Attention: Patch coverage is 33.33333% with 2 lines in your changes missing coverage. Please review.

Project coverage is 84.61%. Comparing base (a310476) to head (be536f3). Report is 1 commits behind head on main.

Files Patch % Lines
pkg/activator/net/revision_backends.go 0.00% 1 Missing and 1 partial :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #15309      +/-   ##
==========================================
+ Coverage   84.56%   84.61%   +0.05%     
==========================================
  Files         219      219              
  Lines       13584    13587       +3     
==========================================
+ Hits        11487    11497      +10     
+ Misses       1727     1724       -3     
+ Partials      370      366       -4     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Jun 06 '24 12:06 codecov[bot]

@dprotaso gentle ping.

skonto avatar Jun 19 '24 07:06 skonto

/lgtm /approve

dprotaso avatar Jul 08 '24 17:07 dprotaso

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dprotaso, ReToCode

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

knative-prow[bot] avatar Jul 08 '24 17:07 knative-prow[bot]

A follow question I have is - what happens when a user container restarts and it has a startup probe?

dprotaso avatar Jul 08 '24 17:07 dprotaso

A follow question I have is - what happens when a user container restarts and it has a startup probe?

I think K8s runs the startup probe on every container creating (including a restarting one): https://github.com/kubernetes/kubernetes/issues/102230

ReToCode avatar Jul 09 '24 05:07 ReToCode

A follow question I have is - what happens when a user container restarts and it has a startup probe?

I think K8s runs the startup probe on every container creating (including a restarting one): kubernetes/kubernetes#102230

I guess to confirm does Knative behave correctly with your changes?

dprotaso avatar Jul 09 '24 18:07 dprotaso

I guess to confirm does Knative behave correctly with your changes?

I think it looks good: https://github.com/ReToCode/knative-multicontainer-probing/blob/main/TESTING_STARTUP_PROBES.md#testing-startup-probe-with-liveness-failures

# Logs of the user container
user-container Liveness probe called, responding with:  true
user-container Liveness is now:  false
user-container Liveness probe called, responding with:  false
user-container Liveness probe called, responding with:  false
user-container Liveness probe called, responding with:  false
queue-proxy {"severity":"INFO","timestamp":"2024-07-10T07:18:02.386311375Z","logger":"queueproxy","caller":"sharedmain/handlers.go:109","message":"Attached drain handler from user-container&{GET /wait-for-drain HTTP/1.1 1 1 map[Accept:[*/*] Accept-Encoding:[gzip] User-Agent:[kube-lifecycle/1.28]] {} <nil> 0 ] false 10.244.2.47:8022 map] map] <nil> map] 10.244.2.1:57410 /wait-for-drain <nil> <nil> <nil> 0x4000289cc0 0x400013f260 ] map]}","commit":"2156812","knative.dev/key":"default/runtime-00001","knative.dev/pod":"runtime-00001-deployment-5d6cf5bbc9-brpdg"}
Stream closed EOF for default/runtime-00001-deployment-5d6cf5bbc9-brpdg (user-container) # restarted by K8s

# restarted user-container
user-container Starting server. Listening on port:  8080
queue-proxy {"severity":"INFO","timestamp":"2024-07-10T07:11:02.57859614Z","logger":"queueproxy","caller":"sharedmain/main.go:271","message":"Starting queue-proxy","commit":"2156812","knative.dev/key":"default/runtime-00001","knative.dev/pod":"runtime-00001-deployment-5d6cf5bbc9-brpdg"}
queue-proxy {"severity":"INFO","timestamp":"2024-07-10T07:11:02.57867039Z","logger":"queueproxy","caller":"sharedmain/main.go:277","message":"Starting http server metrics:9090","commit":"2156812","knative.dev/key":"default/runtime-00001","knative.dev/pod":"runtime-00001-deployment-5d6cf5bbc9-brpdg"}
queue-proxy {"severity":"INFO","timestamp":"2024-07-10T07:11:02.57870039Z","logger":"queueproxy","caller":"sharedmain/main.go:277","message":"Starting http server admin:8022","commit":"2156812","knative.dev/key":"default/runtime-00001","knative.dev/pod":"runtime-00001-deployment-5d6cf5bbc9-brpdg"}
queue-proxy {"severity":"INFO","timestamp":"2024-07-10T07:11:02.578705224Z","logger":"queueproxy","caller":"sharedmain/main.go:277","message":"Starting http server main:8012","commit":"2156812","knative.dev/key":"default/runtime-00001","knative.dev/pod":"runtime-00001-deployment-5d6cf5bbc9-brpdg"}
user-container Startup probe called, responding with:  false
user-container Startup probe called, responding with:  false

After the user-container is restarted, the startup probe is executed again.

ReToCode avatar Jul 10 '24 07:07 ReToCode