gatekeeper-library icon indicating copy to clipboard operation
gatekeeper-library copied to clipboard

PDB Example Only Doing One of It's Checks

Open kylegoch opened this issue 2 years ago • 3 comments

I was super excited to find this template in the library as it solves an immediate use case I have: https://github.com/open-policy-agent/gatekeeper-library/blob/master/library/general/poddisruptionbudget/template.yaml

However on testing it with Gatekeeper 3.7.1 & Kubernetes 1.21.12, I can only trigger the first condition, but never the second condition of .spec.minAvailable: 3 == spec.replicas: 3

Below are my files. The Constraint and Template are the ones straight out of the repo, but add an exclusion for the kube-system namespace

The test deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment-disallowed
  namespace: kagopatest
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
      example: disallowed-deployment
  template:
    metadata:
      labels:
        app: nginx
        example: disallowed-deployment
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80

This PDB is rejected as expected due to this check: https://github.com/open-policy-agent/gatekeeper-library/blob/master/library/general/poddisruptionbudget/template.yaml#L34-L44

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: nginx-pdb-disallowed
  namespace: kagopatest
spec:
  maxUnavailable: 0
  selector:
    matchLabels:
      app: nginx

This PDB is not rejected, which I expected to be rejected from this check: https://github.com/open-policy-agent/gatekeeper-library/blob/master/library/general/poddisruptionbudget/template.yaml#L46-L56

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: nginx-pdb-disallowed
  namespace: kagopatest
spec:
  minAvailable: 3
  selector:
    matchLabels:
      app: nginx

kylegoch avatar Jul 08 '22 14:07 kylegoch

@chewong can you PTAL

ritazh avatar Jul 08 '22 15:07 ritazh

That's the intended behavior. If you try deploying the PDB with minAvailable: 3 first, and then your deployment, Gatekeeper should deny the admission of the deployment.

In https://github.com/open-policy-agent/gatekeeper-library/blob/master/library/general/poddisruptionbudget/template.yaml#L46-L56, input.review.object is the pod-generating resource (e.g. Deployment) so this code path only gets triggered when you deploy a deployment.

You can check out https://github.com/open-policy-agent/gatekeeper-library/pull/177#discussion_r805103931 for the design decision. Tl;dr - it's better to treat pod-generating resources as the object of interest so we can be more memory-efficient when caching PDBs.

chewong avatar Jul 11 '22 17:07 chewong

That's the intended behavior. If you try deploying the PDB with minAvailable: 3 first, and then your deployment, Gatekeeper should deny the admission of the deployment.

In https://github.com/open-policy-agent/gatekeeper-library/blob/master/library/general/poddisruptionbudget/template.yaml#L46-L56, input.review.object is the pod-generating resource (e.g. Deployment) so this code path only gets triggered when you deploy a deployment.

You can check out #177 (comment) for the design decision. Tl;dr - it's better to treat pod-generating resources as the object of interest so we can be more memory-efficient when caching PDBs.

If the deployment already exists in an environment is there any way to enforce the policy? (e.g. reject updates to the deployment). I had the same incorrect assumption that applying a PDB that matches a deployment should violate the constraint. Now that I know that's not the case I tried recreating the deployment but the constraint was still not hit.

Here is the PDB, minAvailable = 3

  spec:
    minAvailable: 3
    selector:
      matchLabels:
        alertmanager: alertmanager-main
  status:
    conditions:
    - lastTransitionTime: "2022-08-12T20:10:55Z"
      message: ""
      observedGeneration: 1
      reason: InsufficientPods
      status: "False"
      type: DisruptionAllowed
    currentHealthy: 3
    desiredHealthy: 3
    disruptionsAllowed: 0
    expectedPods: 3
    observedGeneration: 1

Now I'll delete the statefulset that the selector is matching on, and based on my understanding recreating it should violate the above PDB because sts.spec.replicas is not greater than pdb.spec.minAvailable but that does not end up happening.

$ oc get sts alertmanager-alertmanager-main
NAME                             READY   AGE
alertmanager-alertmanager-main   3/3     13m

$ oc get pods -l alertmanager=alertmanager-main
NAME                               READY   STATUS    RESTARTS   AGE
alertmanager-alertmanager-main-0   2/2     Running   0          12m
alertmanager-alertmanager-main-1   2/2     Running   0          12m
alertmanager-alertmanager-main-2   2/2     Running   0          12m

$ oc get sts alertmanager-alertmanager-main -o yaml > /tmp/am-sts.yaml

$ oc delete sts alertmanager-alertmanager-main
statefulset.apps "alertmanager-alertmanager-main" deleted

$ oc get pods -l alertmanager=alertmanager-main
No resources found in test namespace.

$ oc create -f /tmp/am-sts.yaml
statefulset.apps/alertmanager-alertmanager-main created

$ oc get pdb
NAME               MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
alertmanager-pdb   3               N/A               0                     52m

The other rule is working though if pdb.spec.maxUnavailable = 0.

$ oc create -f /tmp/pdb
Error from server (Forbidden): error when creating "/tmp/pdb": admission webhook "validation.gatekeeper.sh" denied the request: [pod-distruption-budget] PodDisruptionBudget <alertmanager-pdb> has maxUnavailable of 0, only positive integers are allowed for maxUnavailable

Edit:

If anyone used kustomize to install you'll need to make sure to install sync.yaml or edit your existing one and append the objects to sync, I hadn't realized it was required but I understand now the inventory isn't populated for these reference based policies unless they are kept in cache. The policy is working after syncing the PDBs in cache.

Another minor issue is that the policy is checking to see whether ALL of the labels from .spec.selector.matchLabels in the PDB are equal to the controllers selector, however it's possible for a PDB to be configured with only a subset of the labels. It might be better of the policy checked to see if all the labels in the PDB are set and equal to those in the controllers selector, but it shouldn't care about any extraneous labels in the workloads match labels selector.

ctrought avatar Aug 12 '22 20:08 ctrought

This issue/PR has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jan 31 '23 23:01 stale[bot]