gatekeeper safe to evict emptyDir local storage to unblock the cluster downscaling.

Describe the solution you'd like Those pods which are using local storage, should have an annotation of cluster-autoscaler.kubernetes.io/safe-to-evict: "true" because - emptyDir will block the cluster downscaling.

Anything else you would like to add: Pods with volume of local storage volumes: - emptyDir: {} name: tmp-volume

Environment: PROD

Gatekeeper version: 3.8
Kubernetes version: (use kubectl version): Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.14-gke.700", GitCommit:"1781919224b267c523fd76047cebf7b14c6aa1d9", GitTreeState:"clean", BuildDate:"2022-06-28T09:30:29Z", GoVersion:"go1.16.15b7", Compiler:"gc", Platform:"linux/amd64"}

Sep 27 '22 07:09 ZiaUrRehman-GBI

hey @ZiaUrRehman-GBI thanks for opening this issue. I'm going to spend some time looking into this and follow up here when I know more.

Oct 10 '22 18:10 acpana

hey @ZiaUrRehman-GBI I had a look but I couldn't repro :/ . I see kubectl scale work as expected on the g8r pods as defined in the latest config under deploy/.

$ kubectl scale deployment/gatekeeper-controller-manager --replicas 10 -n gatekeeper-system
...
$ kubectl scale deployment/gatekeeper-controller-manager --replicas 1 -n gatekeeper-system
...
$ kubectl scale deployment/gatekeeper-audit --replicas 10 -n gatekeeper-system
...
$ kubectl scale deployment/gatekeeper-audit --replicas 1 -n gatekeeper-system
...

Let me ask you for a couple questions.

Apologies in advance if you already communicated more details in another channel. Please bear w me.

Tell us more about your environment

Pods

How many pods are you running? How many as audit vs controller-manager ?
How many pods are you trying to scale down/ up? Are you using the plain k8s autoscaler or relying on a cloud provider to use it?
How many do pods you see not scaled?
- For those pods that don't get scaled, is there anything interesting in the logs

Volumes

At a high level, what's on those volumes around the time of the scaling?

Oct 11 '22 02:10 acpana

Hey @acpana, May be I didn't convey you properly or you don't get me. I don't mean to scale the opa gatekeepers deployment. I was talking about GKE scale down due to opa pods of local storage. But on GKE side after 1.22.x Relase they fixed this issue. So you are to feel free to close it. But on other provider like AWS and AKS, this problem still exist so they have to either provide this annotation or set skip-node-with-local-storage

Oct 11 '22 02:10 ZiaUrRehman-GBI

+1 @ZiaUrRehman-GBI Thanks for reporting the issue. Would you like to open a PR to add the annotation for the audit pod?

Oct 11 '22 06:10 ritazh

Sure I will open. 👍

Oct 11 '22 06:10 ZiaUrRehman-GBI

@ZiaUrRehman-GBI we already have podAnnotations value in the chart, would that work? if so, sounds like we might want to document this in https://open-policy-agent.github.io/gatekeeper/website/docs/vendor-specific?

Oct 12 '22 23:10 sozercan

Thanks @sozercan! You can search for podAnnotations in the chart readme https://github.com/open-policy-agent/gatekeeper/tree/master/charts/gatekeeper

Oct 13 '22 00:10 ritazh

Thanks, doc will help a lot

Oct 13 '22 02:10 ZiaUrRehman-GBI

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

Dec 12 '22 12:12 stale[bot]

gatekeeper gatekeeper copied to clipboard

safe to evict emptyDir local storage to unblock the cluster downscaling.

Tell us more about your environment

Pods

Volumes

gatekeeper
gatekeeper copied to clipboard