PodDisruputionBudget for knative service pods
In what area(s)?
/area autoscale
Describe the feature
Although Knative autoscaling can maintain a number of minimum replicas per revision, I think this is only limited to actions that Knative controls. If other actors evict Knative service pods, then the service may have less available pods than the minimum replicas. One example of other actors that can mess up the Knative minimum state is the high-performance cluster autoscaler Karpenter which has a consolidation feature.
The way I'm trying to mitigate this problem is manually creating a PodDisruptionBudget targeting the pods of my Knative service, with the PDB's minAvailable value set to the KSVC's autoscaling.knative.dev/min-scale value.
I was asked by @dprotaso to mention my case in a Github issue here, so please let me know what you think.
I think its a valid point to discuss, we already do have PDBs for the important data-path components. But it might need some design on when to create/update/delete the PDBs (as min-scale is not always set and scaling itself is dynamic, including to zero). Also there is a certain overhead to adjusting the PDBs on scaling changes, so up for discussion.
/triage accepted
I would find having PDBs created based on min-scale to be rather valuable for my workloads in a semi-disruptive environment with cluster-upgrades
I have had experience where roling nodes during cluster upgrades gets stuck due the KNative SPOF pre-set PDB's . It would good to have default values for these to allow at least single node disruptions during cluster node rolls and upgrades.
As I wrote in the following issue, I think that Pdb support is really important. Knative currently frequently goes into a degraded state, but it should fire an alert based on Pdb.
https://github.com/knative/serving/issues/15731#issue-2813112461
Overview
When using Knative's minScale annotation (e.g., autoscaling.knative.dev/minScale: '2'), node rolling updates can temporarily cause revision Status to become RevisionFailed or RevisionMissing, triggering false alerts in monitoring tools like Argo CD that incorrectly detect this as "degradation."
In practice, many operations set minScale: 2 for redundancy, following a policy that "service level is OK even if 1 Pod fails," which is typically managed through PDB (PodDisruptionBudget).
Therefore, we would appreciate if you could consider and implement a mechanism for Knative to respect PDBs (either managing/updating PDBs on the Knative Service side or referencing existing PDBs).
Background
During node rolling updates or temporary cluster resource shortages, Ready status may become False, displaying RevisionFailed or RevisionMissing as shown below.
GitOps tools like Argo CD and monitoring solutions treat this situation as an "actual incident" and trigger alerts.
status:
address:
url: http://hogehoge.hogehoge.svc.cluster.local
conditions:
- lastTransitionTime: "2025-01-27T13:42:36Z"
message: 'Revision "hogehoge-00099" failed with message: 0/9 nodes are available:
1 Insufficient cpu, 1 node(s) didn''t match Pod''s node affinity/selector, 1
node(s) had untolerated taint {CriticalAddonsOnly: true}, 1 node(s) had untolerated
taint {karpenter.sh/disrupted: }, 2 node(s) had untolerated taint {component:
envoy}, 4 Insufficient memory. preemption: 0/9 nodes are available: 4 No preemption
victims found for incoming pod, 5 Preemption is not helpful for scheduling..'
reason: RevisionFailed
status: "False"
type: ConfigurationsReady
- lastTransitionTime: "2025-01-27T13:42:36Z"
message: Revision "hogehoge-00099" failed to become ready.
reason: RevisionMissing
status: "False"
type: Ready
- lastTransitionTime: "2025-01-27T13:42:36Z"
message: Revision "hogehoge-00099" failed to become ready.
reason: RevisionMissing
status: "False"
type: RoutesReady
latestCreatedRevisionName: hogehoge-00099
latestReadyRevisionName: hogehoge-00099
observedGeneration: 35
traffic:
- latestRevision: true
percent: 100
revisionName: hogehoge-00099
url: http://hogehoge.hogehoge.example.com
However, in redundant configurations (e.g., minScale set to 2), service levels may still be met even if 1 Pod temporarily fails. In standard Kubernetes operations, this availability is managed through PDBs, which control how many Pod failures are acceptable within the PDB policy.
Request/Proposal
Add PDB support as a resource that Knative Service creates and manages.
Desired Outcome
- Use PDB to accept "expected Pod reduction" and avoid treating states that haven't compromised service levels as "status abnormal"
- Suppress false alarms and noise alerts from operational tools while maintaining redundancy
Expected Benefits
- Better reflection of true system health status when minScale is configured
- Reduced false alerts, leading to decreased operational burden
- Smooth integration between standard Kubernetes operations (availability management via PDB) and Knative serverless operations
Steps to Reproduce (Example)
- Set
autoscaling.knative.dev/minScale: '2'on Knative Service - Trigger rolling update (node upgrade, cluster scale-down, etc.)
- Knative Revision temporarily becomes
FailedorMissing
Expected Behavior
- If PDB is defined and Pod downs are within its limits, Knative Revision status should not be treated as "Failed"
Additional Notes
- The above is just one example
- We would appreciate your consideration for cases requiring Pod down count control in large-scale workloads and high-availability systems