serving PodDisruputionBudget for knative service pods

In what area(s)?

/area autoscale

Describe the feature

Although Knative autoscaling can maintain a number of minimum replicas per revision, I think this is only limited to actions that Knative controls. If other actors evict Knative service pods, then the service may have less available pods than the minimum replicas. One example of other actors that can mess up the Knative minimum state is the high-performance cluster autoscaler Karpenter which has a consolidation feature.

The way I'm trying to mitigate this problem is manually creating a PodDisruptionBudget targeting the pods of my Knative service, with the PDB's minAvailable value set to the KSVC's autoscaling.knative.dev/min-scale value.

I was asked by @dprotaso to mention my case in a Github issue here, so please let me know what you think.

Mar 06 '23 12:03 ashrafguitoni

I think its a valid point to discuss, we already do have PDBs for the important data-path components. But it might need some design on when to create/update/delete the PDBs (as min-scale is not always set and scaling itself is dynamic, including to zero). Also there is a certain overhead to adjusting the PDBs on scaling changes, so up for discussion.

/triage accepted

Mar 08 '23 12:03 ReToCode

I would find having PDBs created based on min-scale to be rather valuable for my workloads in a semi-disruptive environment with cluster-upgrades

Sep 09 '23 04:09 BobyMCbobs

I have had experience where roling nodes during cluster upgrades gets stuck due the KNative SPOF pre-set PDB's . It would good to have default values for these to allow at least single node disruptions during cluster node rolls and upgrades.

Oct 14 '24 02:10 whatnick

As I wrote in the following issue, I think that Pdb support is really important. Knative currently frequently goes into a degraded state, but it should fire an alert based on Pdb.

https://github.com/knative/serving/issues/15731#issue-2813112461

Overview

When using Knative's minScale annotation (e.g., autoscaling.knative.dev/minScale: '2'), node rolling updates can temporarily cause revision Status to become RevisionFailed or RevisionMissing, triggering false alerts in monitoring tools like Argo CD that incorrectly detect this as "degradation." In practice, many operations set minScale: 2 for redundancy, following a policy that "service level is OK even if 1 Pod fails," which is typically managed through PDB (PodDisruptionBudget). Therefore, we would appreciate if you could consider and implement a mechanism for Knative to respect PDBs (either managing/updating PDBs on the Knative Service side or referencing existing PDBs).

Background

During node rolling updates or temporary cluster resource shortages, Ready status may become False, displaying RevisionFailed or RevisionMissing as shown below. GitOps tools like Argo CD and monitoring solutions treat this situation as an "actual incident" and trigger alerts.

status:
  address:
    url: http://hogehoge.hogehoge.svc.cluster.local
  conditions:
  - lastTransitionTime: "2025-01-27T13:42:36Z"
    message: 'Revision "hogehoge-00099" failed with message: 0/9 nodes are available:
      1 Insufficient cpu, 1 node(s) didn''t match Pod''s node affinity/selector, 1
      node(s) had untolerated taint {CriticalAddonsOnly: true}, 1 node(s) had untolerated
      taint {karpenter.sh/disrupted: }, 2 node(s) had untolerated taint {component:
      envoy}, 4 Insufficient memory. preemption: 0/9 nodes are available: 4 No preemption
      victims found for incoming pod, 5 Preemption is not helpful for scheduling..'
    reason: RevisionFailed
    status: "False"
    type: ConfigurationsReady
  - lastTransitionTime: "2025-01-27T13:42:36Z"
    message: Revision "hogehoge-00099" failed to become ready.
    reason: RevisionMissing
    status: "False"
    type: Ready
  - lastTransitionTime: "2025-01-27T13:42:36Z"
    message: Revision "hogehoge-00099" failed to become ready.
    reason: RevisionMissing
    status: "False"
    type: RoutesReady
  latestCreatedRevisionName: hogehoge-00099
  latestReadyRevisionName: hogehoge-00099
  observedGeneration: 35
  traffic:
  - latestRevision: true
    percent: 100
    revisionName: hogehoge-00099
  url: http://hogehoge.hogehoge.example.com

However, in redundant configurations (e.g., minScale set to 2), service levels may still be met even if 1 Pod temporarily fails. In standard Kubernetes operations, this availability is managed through PDBs, which control how many Pod failures are acceptable within the PDB policy.

Request/Proposal

Add PDB support as a resource that Knative Service creates and manages.

Desired Outcome

Use PDB to accept "expected Pod reduction" and avoid treating states that haven't compromised service levels as "status abnormal"
Suppress false alarms and noise alerts from operational tools while maintaining redundancy

Expected Benefits

Better reflection of true system health status when minScale is configured
Reduced false alerts, leading to decreased operational burden
Smooth integration between standard Kubernetes operations (availability management via PDB) and Knative serverless operations

Steps to Reproduce (Example)

Set autoscaling.knative.dev/minScale: '2' on Knative Service
Trigger rolling update (node upgrade, cluster scale-down, etc.)
Knative Revision temporarily becomes Failed or Missing

Expected Behavior

If PDB is defined and Pod downs are within its limits, Knative Revision status should not be treated as "Failed"

Additional Notes

The above is just one example
We would appreciate your consideration for cases requiring Pod down count control in large-scale workloads and high-availability systems

Feb 10 '25 12:02 kahirokunn