serving icon indicating copy to clipboard operation
serving copied to clipboard

Knative service pods scaling down to zero even if minScale is set to 1

Open lsergio opened this issue 1 month ago • 5 comments

Ask your question here:

Hi all. I have some Knative Services which are configured to have at least 1 replica:

spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/minScale: "1"

However, I see 0 replicas for those services.

When I check the autoscaler logs I see messages like:

{"severity":"INFO","timestamp":"2025-11-04T04:01:41.965417619Z","logger":"autoscaler","caller":"kpa/kpa.go:157","message":"SKS should be in Proxy mode: want = -1, ebc = -111, #act's = 3 PA Inactive? = true","commit":"4853ead","knative.dev/controller":"knative.dev.serving.pkg.reconciler.autoscaling.kpa.Reconciler","knative.dev/kind":"autoscaling.internal.knative.dev.PodAutoscaler","knative.dev/traceid":"9b82e383-4a86-4ca1-bd91-7cdea766f54e","knative.dev/key":"dvdfyi9127/deploy-e57863cc-e0ab-46ec-9126-eaf2b9ecf227-00002"}
{"severity":"INFO","timestamp":"2025-11-04T04:01:41.965447059Z","logger":"autoscaler","caller":"kpa/kpa.go:177","message":"PA scale got=0, want=-1, desiredPods=-1 ebc=-111","commit":"4853ead","knative.dev/controller":"knative.dev.serving.pkg.reconciler.autoscaling.kpa.Reconciler","knative.dev/kind":"autoscaling.internal.knative.dev.PodAutoscaler","knative.dev/traceid":"9b82e383-4a86-4ca1-bd91-7cdea766f54e","knative.dev/key":"dvdfyi9127/deploy-e57863cc-e0ab-46ec-9126-eaf2b9ecf227-00002"}

where the desiredPods value is -1, and the current value is 0

Is this a known issue?

This is production environment, so I cannot edit the service for testing. In dev environments I could not reproduce the issue yet.

lsergio avatar Nov 05 '25 17:11 lsergio

Using knative-serving 1.18.1

lsergio avatar Nov 05 '25 17:11 lsergio

Checking the PA shows:

apiVersion: autoscaling.internal.knative.dev/v1alpha1
kind: PodAutoscaler
metadata:
  annotations:
    autoscaling.knative.dev/class: kpa.autoscaling.knative.dev
    autoscaling.knative.dev/metric: concurrency
    autoscaling.knative.dev/minScale: "1"
    serving.knative.dev/creator: system:serviceaccount:dvdfyi9127:camel-k-operator
  creationTimestamp: "2025-10-23T18:03:23Z"
  generation: 3
  labels:
    app: deploy-e57863cc-e0ab-46ec-9126-eaf2b9ecf227-00002
    camel.apache.org/integration: deploy-e57863cc-e0ab-46ec-9126-eaf2b9ecf227
    serving.knative.dev/configuration: deploy-e57863cc-e0ab-46ec-9126-eaf2b9ecf227
    serving.knative.dev/configurationGeneration: "2"
    serving.knative.dev/configurationUID: d118cc7e-b5c9-4c16-aa1b-7e0f79d08ed9
    serving.knative.dev/revision: deploy-e57863cc-e0ab-46ec-9126-eaf2b9ecf227-00002
    serving.knative.dev/revisionUID: 4ba87de6-3993-4e26-ab4a-33d3f7a8124b
    serving.knative.dev/service: deploy-e57863cc-e0ab-46ec-9126-eaf2b9ecf227
    serving.knative.dev/serviceUID: fa5fa918-647f-4ceb-b1f3-66b0057288ad
  name: deploy-e57863cc-e0ab-46ec-9126-eaf2b9ecf227-00002
  namespace: dvdfyi9127
  ownerReferences:
  - apiVersion: serving.knative.dev/v1
    blockOwnerDeletion: true
    controller: true
    kind: Revision
    name: deploy-e57863cc-e0ab-46ec-9126-eaf2b9ecf227-00002
    uid: 4ba87de6-3993-4e26-ab4a-33d3f7a8124b
  resourceVersion: "64686405"
  uid: fdd2ae35-842d-4d02-b7e7-14e2a1e13db6
spec:
  protocolType: h2c
  reachability: Reachable
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: deploy-e57863cc-e0ab-46ec-9126-eaf2b9ecf227-00002-deployment
status:
  actualScale: 0
  conditions:
  - lastTransitionTime: "2025-10-27T18:35:10Z"
    message: The target is not receiving traffic.
    reason: NoTraffic
    status: "False"
    type: Active
  - lastTransitionTime: "2025-10-27T18:35:10Z"
    message: The target is not receiving traffic.
    reason: NoTraffic
    status: "False"
    type: Ready
  - lastTransitionTime: "2025-10-27T18:23:23Z"
    message: K8s Service is not ready
    reason: NotReady
    status: Unknown
    type: SKSReady
  - lastTransitionTime: "2025-10-23T18:04:06Z"
    status: "True"
    type: ScaleTargetInitialized
  desiredScale: 0
  metricsServiceName: deploy-e57863cc-e0ab-46ec-9126-eaf2b9ecf227-00002-private
  observedGeneration: 3
  serviceName: deploy-e57863cc-e0ab-46ec-9126-eaf2b9ecf227-00002

It is scaling down due to No Traffic, although minScale being set to 1

lsergio avatar Nov 05 '25 18:11 lsergio

@lsergio Can you test with a newer version of Knative?

I included a fix where revisions (w/min-scale annotation) would scale down prematurely when rolling out a new revision. The fix is in a newer 1.18 patch release. I don't think this will resolve your issue but worth testing with the latest patch.

https://github.com/knative/serving/releases/tag/knative-v1.18.2

What's interesting about your PA is that it's reachable (min-scale is only set if a route is pointing to it).

Can you enable debug logging for the autoscaler and collect some logs there?

dprotaso avatar Nov 05 '25 18:11 dprotaso

Hi @dprotaso. I have changed the log level to debug and attached the logs for a few execution minutes.

autoscaler.log

As this is a production environment, I cannot change Knative version without going through a prior validation process in dev and staging. And I haven't seen the issue in these environments yet.

lsergio avatar Nov 06 '25 12:11 lsergio

There's not much info there

grep '^{' autoscaler.log | grep deploy-e57863cc-e0ab-46ec-9126-eaf2b9ecf227-00002 | jq .message

"|OldPods| = 0, |YoungPods| = 0"
"No data to scale on yet"
"|OldPods| = 0, |YoungPods| = 0"
"|OldPods| = 0, |YoungPods| = 0"
"No data to scale on yet"
"|OldPods| = 0, |YoungPods| = 0"
"|OldPods| = 0, |YoungPods| = 0"
"No data to scale on yet"

Also going through the code

If there are no metrics we skip ticking the scaler here https://github.com/knative/serving/blob/f2775178794b7d42c8f38dd930c58a782a37a3e3/pkg/autoscaler/scaling/autoscaler.go#L171-L178

https://github.com/knative/serving/blob/f2775178794b7d42c8f38dd930c58a782a37a3e3/pkg/autoscaler/scaling/multiscaler.go#L335-L346

When scaling decisions are made it happens here: https://github.com/knative/serving/blob/f2775178794b7d42c8f38dd930c58a782a37a3e3/pkg/reconciler/autoscaling/kpa/scaler.go#L345

But when the PodAutoscaler's reachability changes it should inform the controller to reconcile and thus triggering the scale to minScale 1

https://github.com/knative/serving/blob/fd49d7a01333ea73b12e8bc14cdbd1bf8673c546/pkg/reconciler/autoscaling/kpa/controller.go#L93-L96

I'm inclined to think that maybe an informer event was dropped. Thus if you mutate it it should trigger a reconciliation and scale. Note mutations will go away since our controller will revert changes

Do you happen to have any historical logs that might show the last

dprotaso avatar Nov 18 '25 19:11 dprotaso