serving
serving copied to clipboard
Fix deployment status propagation when scaling from zero
Fixes #14157
Proposed Changes
-
Introduces a new PA condition (
PodAutoscalerConditionScaleTargetScaled
) that detects failures during scaling to zero, covering the K8s gaps where deployment status is not updated correctly. The condition is set tofalse
just before we scale down to zero (before the deployment update happens) and if pods are crashing. We set it back totrue
when we scale from zero and we have enough ready pods. -
Previously when deployment was scaled down to zero, revision ready status would be true (and stay that way), but with this patch the pod error is detected and propagated:
Ksvc status:
{
"lastTransitionTime": "2024-10-04T13:57:35Z",
"message": "Revision \"revision-failure-00001\" failed with message: Back-off pulling image \"index.docker.io/skonto/revisionfailure@sha256:c7dd34a5919877b89617c3a0df7382e7de0f98318f2c12bf4374bb293f104977\".",
"reason": "RevisionFailed",
"status": "False",
"type": "ConfigurationsReady"
},
Revision:
k get revision
NAME CONFIG NAME GENERATION READY REASON ACTUAL REPLICAS DESIRED REPLICAS
revision-failure-00001 revision-failure 1 False ImagePullBackOff 0 0
PA status:
{
"lastTransitionTime": "2024-10-04T13:57:35Z",
"message": "Back-off pulling image \"index.docker.io/skonto/revisionfailure@sha256:c7dd34a5919877b89617c3a0df7382e7de0f98318f2c12bf4374bb293f104977\"",
"reason": "ImagePullBackOff",
"status": "False",
"type": "ScaleTargetScaled"
}
],
- Updates the pa status propagation logic in the revision reconciler.
- Extends a bit the resource quota e2e test to show that when deployment is scaled to zero we will still report the error. That is irrelevant to this patch but we want to show that we cover certain scenarios more. Probably it would be good to add more e2e tests anyway.
- The steps to test is simply start a skvc, let it scale to zero then remove the image from your registry, block any access (kill internet) and then issue a request.