prometheus-engine icon indicating copy to clipboard operation
prometheus-engine copied to clipboard

Controller has error `cannot update resource \"podmonitorings\" in API group \"monitoring.googleapis.com\"`

Open zchenyu opened this issue 11 months ago • 5 comments

This is from a 0.14.0 install

{"level":"error","ts":"2025-01-24T01:28:15Z","msg":"Reconciler error","controller":"collector-config","controllerGroup":"monitoring.googleapis.com","controllerKind":"OperatorConfig","OperatorConfig":{"name":"config","namespace":"gmp-public"},"namespace":"gmp-public","name":"config","reconcileID":"e1f9ab14-ab0c-4e6c-ba3a-0de29c79145b","error":"ensure collector config: podmonitorings.monitoring.googleapis.com "direct-gateway" is forbidden: User "system:serviceaccount:gmp-system:operator" cannot update resource "podmonitorings" in API group "monitoring.googleapis.com" in the namespace "default"\npodmonitorings.monitoring.googleapis.com "gateway" is forbidden: User "system:serviceaccount:gmp-system:operator" cannot update resource "podmonitorings" in API group "monitoring.googleapis.com" in the namespace "default"\npodmonitorings.monitoring.googleapis.com "envoy-lb-eds" is forbidden: User "system:serviceaccount:gmp-system:operator" cannot update resource "podmonitorings" in API group "monitoring.googleapis.com" in the namespace "kube-system"\npodmonitorings.monitoring.googleapis.com "ratelimit-manager" is forbidden: User "system:serviceaccount:gmp-system:operator" cannot update resource "podmonitorings" in API group "monitoring.googleapis.com" in the namespace "kube-system"\npodmonitorings.monitoring.googleapis.com "xxx" is forbidden: User "system:serviceaccount:gmp-system:operator" cannot update resource "podmonitorings" in API group "monitoring.googleapis.com" in the namespace "xxx"\npodmonitorings.monitoring.googleapis.com "audio" is forbidden: User "system:serviceaccount:gmp-system:operator" cannot update resource "podmonitorings" in API group "monitoring.googleapis.com" in the namespace "default"\npodmonitorings.monitoring.googleapis.com "privatelink-gateway" is forbidden: User "system:serviceaccount:gmp-system:operator" cannot update resource "podmonitorings" in API group "monitoring.googleapis.com" in the namespace "default"","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"}

zchenyu avatar Jan 24 '25 01:01 zchenyu

Adding podmonitorings to gmp-system:operator fixes it

---
# Source: operator/templates/role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: gmp-system:operator
rules:
# Resources controlled by the operator.
- resources:
  - clusterpodmonitorings
  - clusterrules
  - globalrules
  - clusternodemonitorings
  - podmonitorings
  - rules
  apiGroups: ["monitoring.googleapis.com"]
  verbs: ["get", "list", "watch"]
- resources:
  - clusterpodmonitorings/status
  - clusterrules/status
  - globalrules/status
  - clusternodemonitorings/status
  - podmonitorings/status
  - podmonitorings  # ------------------------------------------------------------ ADD THIS
  - rules/status
  apiGroups: ["monitoring.googleapis.com"]
  verbs: ["get", "patch", "update"]
- resources:
  - customresourcedefinitions
  resourceNames: ["verticalpodautoscalers.autoscaling.k8s.io"]
  apiGroups: ["apiextensions.k8s.io"]
  verbs: ["get"]
---

https://raw.githubusercontent.com/GoogleCloudPlatform/prometheus-engine/v0.14.0/manifests/operator.yaml

zchenyu avatar Jan 24 '25 01:01 zchenyu

Hi @zchenyu thanks for reporting. @bwplotka or @bernot-dev could you PTAL?

pintohutch avatar Jan 28 '25 22:01 pintohutch

@zchenyu Can you clarify how you are creating your PodMonitorings? Usually these are not created with the operator service account.

Also, is this in your own cluster or on GKE?

bernot-dev avatar Jan 28 '25 22:01 bernot-dev

We see the same error in the operator logs. After granting the permission to the operator, it seems it tries to update spec.targetLabels.metadata when not set:

spec:
  targetLabels:
    metadata:
    - pod
    - container
    - top_level_controller_name
    - top_level_controller_type

In our case, it is a non-GKE cluster with setup.yaml and operator.yaml applied from GoogleCloudPlatform/prometheus-engine@v0.15.3/manifests

PodMonitoring resources are applied along the manifests of the apps they monitor via our Argo CD instance.

maxbrunet avatar Oct 16 '25 19:10 maxbrunet

Ah - I think that's a bug. Thanks for catching!

It looks like we used to try and update that field, if it was nil, outside of the operator's defaulting webhook.

However it looks like @bernot-dev may have fixed that in preparation for our 0.17 release. We'll test out and confirm to close when we finalize the release.

pintohutch avatar Oct 17 '25 01:10 pintohutch