fleet icon indicating copy to clipboard operation
fleet copied to clipboard

[SURE-9007] Pod Disruption Budgets

Open manno opened this issue 9 months ago • 1 comments

Fleet should support PCs and PDBs. They should taken from the existing AgentDeploymentCustomization struct.

see "RFC: Cattle Cluster Agent Priority Class And Pod Disruption Budgets"

manno avatar Apr 01 '25 09:04 manno

@manno does this require UI changes ?

kkaempf avatar Apr 07 '25 09:04 kkaempf

Reference implementation in Rancher https://github.com/rancher/rancher/issues/48995

p-se avatar Aug 06 '25 10:08 p-se

QA Template

  1. Setup Fleet on a cluster. Rancher is not required.

  2. Edit the Fleet cluster resource, add spec.agentSchedulingCustomization like so:

    agentSchedulingCustomization:
    priorityClass:
      value: 777
    podDisruptionBudget:
      minAvailable: "1"
    

    This should result in the creation of a PriorityClass named fleet-agent-priority-class and a PodDisruptionBudget with name fleet-agent-pod-disruption-budget.

  3. Make sure the Agent's Deployment was updated and that the pod is successfully running. If a spec.priorityClassName is configured and used in a Deployment before the PriorityClass is actually created, the Pod will hang. This must not happen.3. Check that the values of fleet-agent-priority-class and fleet-agent-pod-disruption-budget are according to the specification in the cluster resource.

  4. Check that the selectors of fleet-agent-pod-disruption-budget correctly point to the Fleet agent deployment.

  5. Check that the spec.priorityClassName field of the Fleet agent deployment correctly points to the PriorityClass. The value for spec.Priority in the Fleet agent deployment should reflect the configured value for priorityClass in the Fleet cluster resource.

  6. Delete the agentSchedulingCustomization field and ensure that the PriorityClass and PodDisruptionBudget resources have been removed and that the Deployment of the Fleet agent does not contain a reference to spec.priorityClassName. Optionally extend testing for other configurable values of the spec.agentSchedulingCustomization field (podDisruptionBudget.maxUnavailable or priorityClass.preemptionPolicy). Setting both, minAvailable and maxUnavailable on a PodDisruptionBudget is supposed to prevent the Fleet agent from being updated, as those values are mutually exclusive and cannot successfully be set on a PodDisruptionBudget.

Every change in the cluster resource is supposed to redeploy the downstream (or local) agent with the previous PriorityClass and PodDisruptionBudget being deleted and recreated. If a value is configured for PriorityClass, a reference to to the PriorityClass in the Fleet agent Deployment must exist.

p-se avatar Sep 08 '25 12:09 p-se

Verified in Rancher 2.13.0-alpha2 with Fleet 0.14.0-alpha.3 Overall working ok when testing in normal condition with single cluster + 1 downstream clusgter


Tested

2- Check priorityClass has value 777 and pdb has value minAvailable: "1" with the below values:

agentSchedulingCustomization:
  priorityClass:
    value: 777
  podDisruptionBudget:
    minAvailable: "1"
Image

3- Verified agent deployment is updated if changed priorityClass has value 333 and pdb has value minAvailable: "4"

agentSchedulingCustomization:
  priorityClass:
    value: 333
  podDisruptionBudget:
    minAvailable: "4"
Image

4- Verified selectors of fleet-agent-pod-disruption-budget correctly point to fleet agent deployment:

 spec:                                                                                                       
   minAvailable: 4                                                                                           
   selector:                                                                                                 
     matchLabels:                                                                                            
       app: fleet-agent  
Image

5- Verified the priorityClassName field of the Fleet agent deployment correctly points to the PriorityClass.

Image

6- Verified after deletion of agentSchedulingCustomization the pc and pdb resources are removed and fleet deployment does not have any ref to priorityClassName. Verified fleet agent gets updated

https://github.com/user-attachments/assets/82c97eb4-ae4d-4069-9877-9be068d256ce


7- Verified the above points also work in downstream clusters


8 - Faulty values ends in error:

  agentSchedulingCustomization:
    priorityClass:
      value: 21
    podDisruptionBudget:
      minAvailable: "b"
Image

However, the UI failed to display an explanation of the error which can be found in the status of the yaml:

Image

I will open a separate issue for this.

mmartin24 avatar Oct 09 '25 11:10 mmartin24