grafana-operator
grafana-operator copied to clipboard
[Bug] 'deployment not ready' is reported and hang 'Phase: reconciling' status for grafana instance using v4.4.1 operator
Describe the bug 'deployment not ready' is reported in grafana instance and event using v4.4.1 operator, after operator is installed and grafana cr instance is created.
Version v4.4.1
To Reproduce Not a full reproduction but this is how I found it:
- Installed operator v4.4.1 to a single namespace
grafanain OCP operator hub. - Created CR for an instance
- check instance, found
Warning ProcessingError 105m (x4188 over 14h) GrafanaDashboard deployment not ready - in OCP console web ui, go to Networking > Route and click
grafana-route, grafana UI loading failed.
Expected behavior No error found and all cr up.
Runtime (please complete the following information):
- OS: Linux
- Grafana Operator Version v4.4.1
- Environment: OCP 4.10.3
- Deployment type: Installed operator v4.4.1 to a single namespace
grafanain OCP operator hub.
Additional context
$ oc get pod
NAME READY STATUS RESTARTS AGE
grafana-deployment-6c6d7cc447-dx6kh 1/1 Running 0 104m
grafana-operator-controller-manager-6444c498b5-dgzrx 2/2 Running 0 14h
$ oc get grafanas.integreatly.org
NAME AGE
example-grafana 14h
$ oc describe grafanas.integreatly.org example-grafana
...
Status:
Message: success
Phase: reconciling
Previous Service Name: grafana-service
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ProcessingError 105m (x4188 over 14h) GrafanaDashboard deployment not ready
same issue here with Grafana Operator Version v4.4.0 & Version v4.4.1
Is there a way to workaround this before we can get an official fix?
@lihongbj have you had a chance to look at the operator logs?
@pb82 It looks there's really nothing special in operator logs. Or if there's any detailed log that we can enable? Here are details:
operator: kube-rbac-proxy
I0509 13:36:30.062223 1 main.go:190] Valid token audiences:
I0509 13:36:30.062389 1 main.go:262] Generating self signed cert as no cert is provided
I0509 13:36:30.828736 1 main.go:311] Starting TCP socket on 0.0.0.0:8443
I0509 13:36:30.829339 1 main.go:318] Listening securely on 0.0.0.0:8443
operator: manager
I0509 13:36:48.839704 1 request.go:655] Throttling request took 1.04093473s, request: GET:https://172.30.0.1:443/apis/monitoring.coreos.com/v1?timeout=32s
I0509 13:36:58.882687 1 request.go:655] Throttling request took 1.046610156s, request: GET:https://172.30.0.1:443/apis/machineconfiguration.openshift.io/v1?timeout=32s
grafana-deployment:
{"@level":"debug","@message":"datasource: registering query type handler","@timestamp":"2022-05-10T02:47:29.000309Z","queryType":"exponential_heatmap_bucket_data"}
{"@level":"debug","@message":"datasource: registering query type handler","@timestamp":"2022-05-10T02:47:29.000372Z","queryType":"linear_heatmap_bucket_data"}
{"@level":"debug","@message":"datasource: registering query type handler","@timestamp":"2022-05-10T02:47:29.000399Z","queryType":"random_walk"}
{"@level":"debug","@message":"datasource: registering query type handler","@timestamp":"2022-05-10T02:47:29.000411Z","queryType":"predictable_pulse"}
{"@level":"debug","@message":"datasource: registering query type handler","@timestamp":"2022-05-10T02:47:29.000421Z","queryType":"predictable_csv_wave"}
{"@level":"debug","@message":"datasource: registering query type handler","@timestamp":"2022-05-10T02:47:29.000435Z","queryType":"random_walk_table"}
{"@level":"debug","@message":"datasource: registering query type handler","@timestamp":"2022-05-10T02:47:29.000445Z","queryType":"slow_query"}
{"@level":"debug","@message":"datasource: registering query type handler","@timestamp":"2022-05-10T02:47:29.000454Z","queryType":"no_data_points"}
{"@level":"debug","@message":"datasource: registering query type handler","@timestamp":"2022-05-10T02:47:29.000470Z","queryType":"datapoints_outside_range"}
{"@level":"debug","@message":"datasource: registering query type handler","@timestamp":"2022-05-10T02:47:29.000481Z","queryType":"manual_entry"}
{"@level":"debug","@message":"datasource: registering query type handler","@timestamp":"2022-05-10T02:47:29.000495Z","queryType":"csv_metric_values"}
{"@level":"debug","@message":"datasource: registering query type handler","@timestamp":"2022-05-10T02:47:29.000506Z","queryType":"streaming_client"}
{"@level":"debug","@message":"datasource: registering query type handler","@timestamp":"2022-05-10T02:47:29.000515Z","queryType":"live"}
{"@level":"debug","@message":"datasource: registering query type handler","@timestamp":"2022-05-10T02:47:29.000530Z","queryType":"grafana_api"}
{"@level":"debug","@message":"datasource: registering query type handler","@timestamp":"2022-05-10T02:47:29.000561Z","queryType":"arrow"}
{"@level":"debug","@message":"datasource: registering query type handler","@timestamp":"2022-05-10T02:47:29.000572Z","queryType":"annotations"}
{"@level":"debug","@message":"datasource: registering query type handler","@timestamp":"2022-05-10T02:47:29.000581Z","queryType":"table_static"}
{"@level":"debug","@message":"datasource: registering query type handler","@timestamp":"2022-05-10T02:47:29.000594Z","queryType":"random_walk_with_error"}
{"@level":"debug","@message":"datasource: registering query type handler","@timestamp":"2022-05-10T02:47:29.000604Z","queryType":"server_error_500"}
{"@level":"debug","@message":"datasource: registering query type handler","@timestamp":"2022-05-10T02:47:29.000621Z","queryType":"logs"}
{"@level":"debug","@message":"datasource: registering query type handler","@timestamp":"2022-05-10T02:47:29.000663Z","queryType":"node_graph"}
{"@level":"debug","@message":"datasource: registering query type fallback handler","@timestamp":"2022-05-10T02:47:29.000682Z"}
Is it possible to install an old version of grafana operator?
Can we get some basic kubernetes debugging.
What is the issue in the deployment.
Please share the deployment yaml that gets generated and share the oc describe pod <grafana-deployment>.
Also please provide the grafana yaml that you are using where you are getting this issue.
Just a quick update, it looks ultimately I can deploy grafana and create dashboard successfully. Probably it's just in the middle of the process. After a while, I can see Grafana CR:
Status:
Message: success
Phase: reconciling
Previous Service Name: grafana-service
Events: <none>
@lihongbj
This issue hasn't been updated for a while, marking as stale, please respond within the next 7 days to remove this label
Any news?
This issue hasn't been updated for a while, marking as stale, please respond within the next 7 days to remove this label
Any news?
@R-Studio we just released version 4.5.0, please give that a try. And if that don't help please provide some debug information, my understanding from the others in this issue is that it's solved for them. As I wrote earlier, there is some basic k8s debugging that is needed. If your deployment is pending, why is it pending? Is the ingress successfully created (assuming that you use ingress)?
@NissesSenap thanks for your reply. The deployment of Grafana, Dashboards and Datasources (ingress...) are working, but the resource with kind "Grafana" is still hanging in the phase "reconciling". Maybe it is correct that the "Grafana" remains in phase "reconciling"?
kubectl describe pod grafana-deplyoment
Name: grafana-deployment-8566cf9b9-86zkg
Namespace: grafana
Priority: 0
PriorityClassName: <none>
Node: k8s-prod-w1/10.0.0.63
Start Time: Sun, 17 Jul 2022 17:10:46 +0200
Labels: app=grafana
app.kubernetes.io/instance=grafana-operator
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=grafana-operator
helm.sh/chart=grafana-operator-2.6.5
pod-template-hash=8566cf9b9
Annotations: cni.projectcalico.org/containerID=2fbd91affd31cfa001c71d331bece1038e02c50f522be1eb60e8e2932237c44a
cni.projectcalico.org/podIP=198.18.1.243/32
cni.projectcalico.org/podIPs=198.18.1.243/32
kubernetes.io/psp=unrestricted-psp
prometheus.io/port=3000
prometheus.io/scrape=true
Status: Running
IP: 198.18.1.243
Controlled By: ReplicaSet/grafana-deployment-8566cf9b9
Init Containers:
grafana-plugins-init:
Container ID: docker://a142be62c9bc299d635433649c728b629e664eb5e4932743cfa4c99574091d3d
Image: docker.io/bitnami/grafana:8.5.6-debian-11-r8
Image ID: docker-pullable://bitnami/grafana@sha256:9d790f1b5586943f0a409b1630c1723035b07a76578898a9346ac579c5595ef4
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 17 Jul 2022 17:13:08 +0200
Finished: Sun, 17 Jul 2022 17:13:08 +0200
Ready: True
Restart Count: 0
Limits:
cpu: 1
memory: 512Mi
Requests:
cpu: 250m
memory: 128Mi
Environment:
GRAFANA_PLUGINS:
Mounts:
/opt/plugins from grafana-plugins (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9gn8b (ro)
Containers:
grafana:
Container ID: docker://ef3fbc85aa5120c40701088e69dfcd1a2fa078d1b43c2d9f1d27e12e2f45e24c
Image: docker.io/bitnami/grafana:8.5.6-debian-11-r8
Image ID: docker-pullable://bitnami/grafana@sha256:9d790f1b5586943f0a409b1630c1723035b07a76578898a9346ac579c5595ef4
Port: 3000/TCP
Host Port: 0/TCP
Args:
-config=/etc/grafana/grafana.ini
State: Running
Started: Sun, 17 Jul 2022 17:13:09 +0200
Ready: True
Restart Count: 0
Limits:
cpu: 1
memory: 1536Mi
Requests:
cpu: 500m
memory: 500Mi
Liveness: http-get http://:3000/api/health delay=120s timeout=5s period=1s #success=1 #failure=6
Readiness: http-get http://:3000/api/health delay=30s timeout=5s period=1s #success=1 #failure=6
Environment Variables from:
grafana-secrets Secret Optional: false
Environment:
LAST_CONFIG: 38088a668a113410dd9fa57621301724b292da062f5ea51358b9e9d35e825977
LAST_DATASOURCES: fba06f372ca63acfc64d883042ee60f6947b7142b9cfd88ea307d0dfe1d5dfdf
Mounts:
/etc/grafana-configmaps/ldap-config from configmap-ldap-config (rw)
/etc/grafana/ from grafana-config (rw)
/etc/grafana/provisioning/dashboards from grafana-provision-dashboards (rw)
/etc/grafana/provisioning/datasources from grafana-datasources (rw)
/etc/grafana/provisioning/notifiers from grafana-provision-notifiers (rw)
/etc/grafana/provisioning/plugins from grafana-provision-plugins (rw)
/var/lib/grafana from grafana-data (rw)
/var/lib/grafana/plugins from grafana-plugins (rw)
/var/log/grafana from grafana-logs (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9gn8b (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
grafana-provision-plugins:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
grafana-provision-dashboards:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
grafana-provision-notifiers:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
grafana-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: grafana-config
Optional: false
grafana-logs:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
grafana-data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: grafana-pvc
ReadOnly: false
grafana-plugins:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
grafana-datasources:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: grafana-datasources
Optional: false
configmap-ldap-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: ldap-config
Optional: false
kube-api-access-9gn8b:
<unknown>
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
kubectl describe grafana -n grafana
Status:
Message: success
Phase: reconciling
Previous Service Name: grafana-service
Events: <none>
@R-Studio so everything is working for you except that it gives the wrong status? Since the deployment is ready the health checks passes so the grafanan instance should be okay.
Can you share your grafana yaml? What happens if you make the config supper basic? Does it pass then?
Do you have the same log entry as defined earlier in this issue? If not please share the logs.
Can you try to uninstall everything and especially remove the CRD:s installing it all again.
I've similar issue. Operator version: v4.5.1 Deployed on openshift Phase: reconciling I was able to create data source CR but only this one. For dashboards even simple one from operator examples is returning "deployment no ready" Notifications also tested and not working. No event message. Logs the same as for @morningspace Unfortunately can't redeploy CDRs at this time. Grafana CRs config is basic:
config:
auth:
disable_login_form: false
disable_signout_menu: true
auth.anonymous:
enabled: false
security:
admin_password: xxx
admin_user: xxx
My bad. Did not noticed that setting dashboardLabelSelector is mandatory. After that dashboards and notifications are loaded successfully.
This issue hasn't been updated for a while, marking as stale, please respond within the next 7 days to remove this label
@marcinsztalmi I already configured the dashboardLabelSelector but no difference (Grafana is still in State Reconciling).
This is how my configuration about 'dashboardLabelSelector' looks like. (Because I use the Bitnami Helm Chart I have to use 'dashboardLabelSelectors' instead of 'dashboardLabelSelector':
grafana:
dashboardLabelSelectors:
- matchExpressions:
- {key: app, operator: In, values: [grafana]}
@R-Studio are you seeing this issue still with the latest version?
@pb82 I have just updated it, but still in state Reconciling. But I think everything else works.
We are seeing this on Grafana Operator v4.4.1 in OpenShift.
Seems like the API Schema for GrafanaDashboard has the "labels" object, while the GrafanaDataSource and GrafanaNotificationChannel API Schema does not on integreatly.org | v1alpha1
A notificationchannel deployment with label app=grafana would result in instant reconcilling of object.
Redeploying the deployment without the label app=grafana would be successful.
We need the app=grafana was that is matchlabel for our operator to find the monitoring object for multi-namespaces.
I don't see how this cannot be resolvable if we simply add the label object into all the Grafana Objects API Schema.
Is anyone from Grafana looking into this?
Seems to be substantial issue considering any GrafanaDashboard using labels (which is required for the artifact to be picked up by the operator) to have the artifact in reconciling status.
@gr-ahh this project isn't supported by grafana. It's a project maintained and developed by mostly redhat employees but not exclusively, we work on this project on our spare time and it's not part of our paid job.
To answer your question, no we are currently not working on it. We haven't gotten any instructions on how to reproduce this issue and this specific issue contains problem from multiple users. Most of which have been solved and they where due to configuration errors.
If you have exact steps of reproducing your issue please share and depending if it's related to this issue or not create a new one. Probably it would be better to create a new one since this contains multiple issue as noted earlier.
Since this issue contains multiple issues and i don't know which one is which any more i will close this issue.
Please feel free to create a new issue where you give reproducible steps. And please update to the latest version of the operator, else it will be very hard for us to debug any issues.
I will lock and comments in this issue.