kube-prometheus icon indicating copy to clipboard operation
kube-prometheus copied to clipboard

Cannot run in GKE Autopilot cluster

Open thecodeassassin opened this issue 4 years ago • 5 comments

What happened?

We cannot install the operator successfully in our GKE Autopilot cluster. this is because of the following error:

cannot list resource "mutatingwebhookconfigurations" in API group "admissionregistration.k8s.io" at the cluster scope: GKEAutopilot authz: cluster scoped resource "mutatingwebhookconfigurations/" is managed and access is denied

Did you expect to see some different?

A working prometheus

How to reproduce it (as minimally and precisely as possible):

  • Create a GKE Autopilot cluster
  • Install the latest version of kube-prometheus
  • Inspect kube-state-metrics pod logs

Environment

  • Prometheus Operator version:
quay.io/prometheus-operator/prometheus-operator:v0.47.0
  • Kubernetes version information:
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.15", GitCommit:"73dd5c840662bb066a146d0871216333181f4b64", GitTreeState:"clean", BuildDate:"2021-01-13T13:22:41Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.16-gke.502", GitCommit:"a2a88ab32201dca596d0cdb116bbba3f765ebd36", GitTreeState:"clean", BuildDate:"2021-03-08T22:06:24Z", GoVersion:"go1.13.15b4", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes cluster kind:

Terraform, also tried with the CLI and UI (there is not so much you can configure)

cannot list resource "mutatingwebhookconfigurations" in API group "admissionregistration.k8s.io" at the cluster scope: GKEAutopilot authz: cluster scoped resource "mutatingwebhookconfigurations/" is managed and access is denied

thecodeassassin avatar Apr 23 '21 11:04 thecodeassassin

I got past this by setting kube-state-metrics.collectors.mutatingwebhookconfigurations to false in my kube-prometheurs helm chart values. It corresponds to this configuration value in kube-state-metrics: https://github.com/kubernetes/kube-state-metrics/blob/master/charts/kube-state-metrics/values.yaml#L140

On a side note: you might need to disable some more stuff to get everything (sort of) working. I'm still messing around with it, but my current config has almost everything disabled to avoid autopilot issues:

prometheusOperator:
  tls:
    enabled: false
  admissionWebhooks:
    enabled: false
    patch:
      enabled: false
      
coreDns:
  enabled: false
kubeControllerManager:
  enabled: false
kubeDns:
  enabled: false
kubeEtcd:
  enabled: false
kubeProxy:
  enabled: false
kubeScheduler:
  enabled: false
nodeExporter:
  enabled: false

AlexVanderbist avatar May 03 '21 13:05 AlexVanderbist

Is this happening during the application of ClusterRole? Specifically this one: https://github.com/prometheus-operator/kube-prometheus/blob/main/manifests/kube-state-metrics-clusterRole.yaml ?

paulfantom avatar May 03 '21 15:05 paulfantom

@AlexVanderbist It's worth to note that some services are already deployed by default using auto-pilot, like metrics services and kube-dns. Using enabled: false, strip out the service and the service-monitor but you still create the servicemonitor

And for others services that are being disable it's not even possible to know if they exists (etcd per example), while some are fully managed by auto-pilot.

Your configuration should work for applications monitoring, anything related to node metrics will probably be hard to control under autopilot.

guitcastro avatar Apr 19 '22 02:04 guitcastro

Any update on this?

bhack avatar Jun 19 '23 11:06 bhack

Any update on this?

We ended up moving away from GKE autopilot because of these and more limitations. Not really useful help but it is what it is.

thecodeassassin avatar Jun 19 '23 11:06 thecodeassassin