gatekeeper icon indicating copy to clipboard operation
gatekeeper copied to clipboard

safe to evict emptyDir local storage to unblock the cluster downscaling.

Open ZiaUrRehman-GBI opened this issue 2 years ago • 9 comments

Describe the solution you'd like Those pods which are using local storage, should have an annotation of cluster-autoscaler.kubernetes.io/safe-to-evict: "true" because - emptyDir will block the cluster downscaling.

Anything else you would like to add: Pods with volume of local storage volumes: - emptyDir: {} name: tmp-volume

Environment: PROD

  • Gatekeeper version: 3.8
  • Kubernetes version: (use kubectl version): Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.14-gke.700", GitCommit:"1781919224b267c523fd76047cebf7b14c6aa1d9", GitTreeState:"clean", BuildDate:"2022-06-28T09:30:29Z", GoVersion:"go1.16.15b7", Compiler:"gc", Platform:"linux/amd64"}

ZiaUrRehman-GBI avatar Sep 27 '22 07:09 ZiaUrRehman-GBI

hey @ZiaUrRehman-GBI thanks for opening this issue. I'm going to spend some time looking into this and follow up here when I know more.

acpana avatar Oct 10 '22 18:10 acpana

hey @ZiaUrRehman-GBI I had a look but I couldn't repro :/ . I see kubectl scale work as expected on the g8r pods as defined in the latest config under deploy/.

$ kubectl scale deployment/gatekeeper-controller-manager --replicas 10 -n gatekeeper-system
...
$ kubectl scale deployment/gatekeeper-controller-manager --replicas 1 -n gatekeeper-system
...
$ kubectl scale deployment/gatekeeper-audit --replicas 10 -n gatekeeper-system
...
$ kubectl scale deployment/gatekeeper-audit --replicas 1 -n gatekeeper-system
...


Let me ask you for a couple questions.

Apologies in advance if you already communicated more details in another channel. Please bear w me.

Tell us more about your environment

Pods
  • How many pods are you running? How many as audit vs controller-manager ?
  • How many pods are you trying to scale down/ up? Are you using the plain k8s autoscaler or relying on a cloud provider to use it?
  • How many do pods you see not scaled?
    • For those pods that don't get scaled, is there anything interesting in the logs
Volumes
  • At a high level, what's on those volumes around the time of the scaling?

acpana avatar Oct 11 '22 02:10 acpana

Hey @acpana, May be I didn't convey you properly or you don't get me. I don't mean to scale the opa gatekeepers deployment. I was talking about GKE scale down due to opa pods of local storage. But on GKE side after 1.22.x Relase they fixed this issue. So you are to feel free to close it. But on other provider like AWS and AKS, this problem still exist so they have to either provide this annotation or set skip-node-with-local-storage

ZiaUrRehman-GBI avatar Oct 11 '22 02:10 ZiaUrRehman-GBI

+1 @ZiaUrRehman-GBI Thanks for reporting the issue. Would you like to open a PR to add the annotation for the audit pod?

ritazh avatar Oct 11 '22 06:10 ritazh

Sure I will open. 👍

ZiaUrRehman-GBI avatar Oct 11 '22 06:10 ZiaUrRehman-GBI

@ZiaUrRehman-GBI we already have podAnnotations value in the chart, would that work? if so, sounds like we might want to document this in https://open-policy-agent.github.io/gatekeeper/website/docs/vendor-specific?

sozercan avatar Oct 12 '22 23:10 sozercan

Thanks @sozercan! You can search for podAnnotations in the chart readme https://github.com/open-policy-agent/gatekeeper/tree/master/charts/gatekeeper

ritazh avatar Oct 13 '22 00:10 ritazh

Thanks, doc will help a lot

ZiaUrRehman-GBI avatar Oct 13 '22 02:10 ZiaUrRehman-GBI

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Dec 12 '22 12:12 stale[bot]