Controller has error `cannot update resource \"podmonitorings\" in API group \"monitoring.googleapis.com\"`
This is from a 0.14.0 install
{"level":"error","ts":"2025-01-24T01:28:15Z","msg":"Reconciler error","controller":"collector-config","controllerGroup":"monitoring.googleapis.com","controllerKind":"OperatorConfig","OperatorConfig":{"name":"config","namespace":"gmp-public"},"namespace":"gmp-public","name":"config","reconcileID":"e1f9ab14-ab0c-4e6c-ba3a-0de29c79145b","error":"ensure collector config: podmonitorings.monitoring.googleapis.com "direct-gateway" is forbidden: User "system:serviceaccount:gmp-system:operator" cannot update resource "podmonitorings" in API group "monitoring.googleapis.com" in the namespace "default"\npodmonitorings.monitoring.googleapis.com "gateway" is forbidden: User "system:serviceaccount:gmp-system:operator" cannot update resource "podmonitorings" in API group "monitoring.googleapis.com" in the namespace "default"\npodmonitorings.monitoring.googleapis.com "envoy-lb-eds" is forbidden: User "system:serviceaccount:gmp-system:operator" cannot update resource "podmonitorings" in API group "monitoring.googleapis.com" in the namespace "kube-system"\npodmonitorings.monitoring.googleapis.com "ratelimit-manager" is forbidden: User "system:serviceaccount:gmp-system:operator" cannot update resource "podmonitorings" in API group "monitoring.googleapis.com" in the namespace "kube-system"\npodmonitorings.monitoring.googleapis.com "xxx" is forbidden: User "system:serviceaccount:gmp-system:operator" cannot update resource "podmonitorings" in API group "monitoring.googleapis.com" in the namespace "xxx"\npodmonitorings.monitoring.googleapis.com "audio" is forbidden: User "system:serviceaccount:gmp-system:operator" cannot update resource "podmonitorings" in API group "monitoring.googleapis.com" in the namespace "default"\npodmonitorings.monitoring.googleapis.com "privatelink-gateway" is forbidden: User "system:serviceaccount:gmp-system:operator" cannot update resource "podmonitorings" in API group "monitoring.googleapis.com" in the namespace "default"","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"}
Adding podmonitorings to gmp-system:operator fixes it
---
# Source: operator/templates/role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: gmp-system:operator
rules:
# Resources controlled by the operator.
- resources:
- clusterpodmonitorings
- clusterrules
- globalrules
- clusternodemonitorings
- podmonitorings
- rules
apiGroups: ["monitoring.googleapis.com"]
verbs: ["get", "list", "watch"]
- resources:
- clusterpodmonitorings/status
- clusterrules/status
- globalrules/status
- clusternodemonitorings/status
- podmonitorings/status
- podmonitorings # ------------------------------------------------------------ ADD THIS
- rules/status
apiGroups: ["monitoring.googleapis.com"]
verbs: ["get", "patch", "update"]
- resources:
- customresourcedefinitions
resourceNames: ["verticalpodautoscalers.autoscaling.k8s.io"]
apiGroups: ["apiextensions.k8s.io"]
verbs: ["get"]
---
https://raw.githubusercontent.com/GoogleCloudPlatform/prometheus-engine/v0.14.0/manifests/operator.yaml
Hi @zchenyu thanks for reporting. @bwplotka or @bernot-dev could you PTAL?
@zchenyu Can you clarify how you are creating your PodMonitorings? Usually these are not created with the operator service account.
Also, is this in your own cluster or on GKE?
We see the same error in the operator logs. After granting the permission to the operator, it seems it tries to update spec.targetLabels.metadata when not set:
spec:
targetLabels:
metadata:
- pod
- container
- top_level_controller_name
- top_level_controller_type
In our case, it is a non-GKE cluster with setup.yaml and operator.yaml applied from GoogleCloudPlatform/prometheus-engine@v0.15.3/manifests
PodMonitoring resources are applied along the manifests of the apps they monitor via our Argo CD instance.
Ah - I think that's a bug. Thanks for catching!
It looks like we used to try and update that field, if it was nil, outside of the operator's defaulting webhook.
However it looks like @bernot-dev may have fixed that in preparation for our 0.17 release. We'll test out and confirm to close when we finalize the release.