helm-charts
helm-charts copied to clipboard
[kube-prometheus-stack] grafana: Readiness probe failed: connect: connection refused
Describe the bug a clear and concise description of what the bug is.
Hi!
I have deployed the kube-prometheus-stack using FluxCD with the latest 56.6.2 version.
Prometheus along with Loki works fine. However, Grafana has some problems after a while.
It lasted approximately 60 minutes to start up fully until all migrations have been done. Then, whenever I make changes in the Dashboard (eg. adding a new data source) the pod fails. After inspecting the logs I have found these error messages:
{"time": "2024-02-14T15:50:37.062173+00:00", "taskName": null, "msg": "Writing /tmp/dashboards/apiserver.json (ascii)", "level": "INFO"}
{"time": "2024-02-14T15:50:37.065761+00:00", "taskName": null, "msg": "Retrying (Retry(total=4, connect=9, read=5, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ffaff8f8f80>: Failed to establish a new connection: [Errno 111] Connection refused')': /api/admin/provisioning/dashboards/reload", "level": "WARNING"}
{"time": "2024-02-14T15:50:39.266982+00:00", "taskName": null, "msg": "Retrying (Retry(total=3, connect=8, read=5, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ffaff8f90a0>: Failed to establish a new connection: [Errno 111] Connection refused')': /api/admin/provisioning/dashboards/reload", "level": "WARNING"}
{"time": "2024-02-14T15:50:43.669076+00:00", "taskName": null, "msg": "Retrying (Retry(total=2, connect=7, read=5, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ffaff8f9340>: Failed to establish a new connection: [Errno 111] Connection refused')': /api/admin/provisioning/dashboards/reload", "level": "WARNING"}
{"time": "2024-02-14T15:50:52.471752+00:00", "taskName": null, "msg": "Retrying (Retry(total=1, connect=6, read=5, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ffaff8f96a0>: Failed to establish a new connection: [Errno 111] Connection refused')': /api/admin/provisioning/dashboards/reload", "level": "WARNING"}
{"time": "2024-02-14T15:51:10.074029+00:00", "taskName": null, "msg": "Retrying (Retry(total=0, connect=5, read=5, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ffaff8f9820>: Failed to establish a new connection: [Errno 111] Connection refused')': /api/admin/provisioning/dashboards/reload", "level": "WARNING"}
{"time": "2024-02-14T15:51:10.076283+00:00", "taskName": null, "msg": "Received unknown exception: HTTPConnectionPool(host='localhost', port=3000): Max retries exceeded with url: /api/admin/provisioning/dashboards/reload (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ffaff8f9a90>: Failed to establish a new connection: [Errno 111] Connection refused'))\n", "level": "ERROR"}
Traceback (most recent call last):
File "/app/.venv/lib/python3.12/site-packages/urllib3/connection.py", line 203, in _new_conn
sock = connection.create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/urllib3/util/connection.py", line 85, in create_connection
raise err
File "/app/.venv/lib/python3.12/site-packages/urllib3/util/connection.py", line 73, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
The pod tries to restart but fails with the aformentioned bug. In Lens it always says: Readiness probe failed: Get "http://192.168.1.247:3000/api/health": dial tcp 192.168.1.247:3000: connect: connection refused
What's your helm version?
3.14.0
What's your kubectl version?
1.29.1
Which chart?
kube-prometheus-stack
What's the chart version?
56.6.2
What happened?
Making changes in the Dashboard (eg. adding new data sources such as Loki) fails with the stated Python error.
What I have also encountered is that since the newest release, the Dashboard seems slower than with previous releases.
What you expected to happen?
Dashboard should correctly set the datasource
How to reproduce it?
- Enable Grafana and Loki in
values.yaml - Deploy using FluxCD or helm
- Add new Loki Datasource
- Check if Dashboard / Pod is still running
- Additionally check logs
Enter the changed values of values.yaml?
prometheus: ingress: enabled: true annotations: cert-manager.io/cluster-issuer: "letsencrypt-issuer" kubernetes.io/ingressClassName: nginx nginx.ingress.kubernetes.io/service-upstream: "true"
# nginx-http-auth config:
nginx.ingress.kubernetes.io/auth-type: basic
# the name of the secret that contains the htpasswd hash (has to exist beforehand)
nginx.ingress.kubernetes.io/auth-secret: prometheus-htpasswd
# message to display on auth missing:
nginx.ingress.kubernetes.io/auth-realm: 'Authentication Required - Prometheus'
hosts:
- prometheus.xxx
path: /
service:
name: prometheus-prometheus-kube-prometheus-prometheus
port: 9090
tls:
- secretName: prometheus-prod-secret
hosts:
- prometheus.xxx
prometheusSpec:
replicas: 1
retention: 168h
walCompression: true
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: "myBlock"
resources:
requests:
storage: 50Gi
# scrape all service monitorings without correct labeling
podMonitorSelectorNilUsesHelmValues: false
serviceMonitorSelectorNilUsesHelmValues: false
grafana:
admin:
existingSecret: grafana-admin-secret
userKey: admin-user
passwordKey: admin-password
ingress:
enabled: true
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-issuer"
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/service-upstream: "true"
# nginx-http-auth config:
nginx.ingress.kubernetes.io/auth-type: basic
# the name of the secret that contains the htpasswd hash (has to exist beforehand)
nginx.ingress.kubernetes.io/auth-secret: prometheus-htpasswd
# message to display on auth missing:
nginx.ingress.kubernetes.io/auth-realm: 'Authentication Required - Grafana'
hosts:
- grafana.xxx
path: /
service:
name: prometheus-grafana
port: 3000
tls:
- secretName: grafana-xxx
hosts:
- grafana.xxx
persistence:
enabled: true
type: pvc
size: 10Gi
storageClassName: "myStorageClass"
Enter the command that you execute and failing/misfunctioning.
helm install prometheus prometheus-community/kube-prometheus-stack --values values.yaml
Anything else we need to know?
No response
I got this error because the pod couldn't write to the persistent storage location.
same issue here
repositories:
- name: prometheus-community
url: https://prometheus-community.github.io/helm-charts
releases:
- name: kube-prometheus-stack
namespace: monitoring
chart: prometheus-community/kube-prometheus-stack
version: 56.20.0
installed: true
values:
- values.yaml