helm-charts
helm-charts copied to clipboard
[kube-prometheus-stack] prometheus pod goes terminated - completed after some time
Describe the bug a clear and concise description of what the bug is.
I'm having issues using this chart. After installation everything works smoothly, until some short time goes by, like 1 hour, and the prometheus/grafana pod goes to status terminated - completed and it stops gathering metrics:
This is how I installed the helm chart:
helm install -f /home/anthony/proyectos/multivende/main_to_publish/k8s/miscellaneous/production/values.yml prom-grafana prometheus-community/kube-prometheus-stack
These are my values:
prometheus:
server:
persistentVolume:
enabled: true
prometheusSpec:
storageSpec:
volumeClaimTemplate:
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 600Gi
grafana:
enabled: true
persistence:
enabled: true
type: pvc
storageClassName: gp2
accessModes:
- ReadWriteOnce
size: 600Gi
finalizers:
- kubernetes.io/aws-ebs
What's your helm version?
version.BuildInfo{Version:"v3.5.2", GitCommit:"167aac70832d3a384f65f9745335e9fb40169dc2", GitTreeState:"dirty", GoVersion:"go1.15.7"}
What's your kubectl version?
Client Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.9-eks-d1db3c", GitCommit:"d1db3c46e55f95d6a7d3e5578689371318f95ff9", GitTreeState:"clean", BuildDate:"2020-10-20T22:21:03Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"27+", GitVersion:"v1.27.8-eks-8cb36c9", GitCommit:"fca3a8722c88c4dba573a903712a6feaf3c40a51", GitTreeState:"clean", BuildDate:"2023-11-22T21:52:13Z", GoVersion:"go1.20.11", Compiler:"gc", Platform:"linux/amd64"}
Which chart?
prometheus-community/kube-prometheus-stack
What's the chart version?
latest
What happened?
No response
What you expected to happen?
I expected the pod to be running all the time.
How to reproduce it?
No response
Enter the changed values of values.yaml?
No response
Enter the command that you execute and failing/misfunctioning.
prometheus:
server:
persistentVolume:
enabled: true
prometheusSpec:
storageSpec:
volumeClaimTemplate:
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 600Gi
grafana:
enabled: true
persistence:
enabled: true
type: pvc
storageClassName: gp2
accessModes:
- ReadWriteOnce
size: 600Gi
finalizers:
- kubernetes.io/aws-ebs
### Anything else we need to know?
_No response_
Hej, i just want to let you know that i am fighting with nearly the same issue. In my case the operator throws some errors that a serviceAccount is missing. After that the operator get killed and the deployment of prometheus and grafana disappears.
Hej, i just want to let you know that i am fighting with the same nearly the same issue. In my case the operator throws some errors that a serviceAccount is missing. After that the operator get killed and the deployment of prometheus and grafana disappears.
This seems to be different than my problem. I kind of "fixed" it by creating a cronjob that removes completed pods and they recreate again and continue to gather metrics. You seem to have some sort of permission issues. Check that your service accounts do exist and have the right permissions.
What is the exact Helm chart version?
What is the exact Helm chart version?
It's kube-prometheus-stack-54.1.0
I'm seeing the same with 56.21.3 deployed with
grafana:
additionalDataSources:
- access: proxy
jsonData:
maxLines: 1000
tlsSkipVerify: true
name: Loki
type: loki
url: http://loki.loki.svc.cluster.local:3100
defaultDashboardsEnabled: true
persistence:
enabled: true
size: 500Mi
sidecar:
datasources:
enabled: true
label: grafana_datasource
prometheus:
prometheusSpec:
additionalScrapeConfigs:
- job_name: gpu-metrics
kubernetes_sd_configs:
- namespaces:
names:
- gpu-operator
role: endpoints
metrics_path: /metrics
relabel_configs:
- action: replace
source_labels:
- __meta_kubernetes_pod_node_name
target_label: kubernetes_node
scheme: http
scrape_interval: 1s
podMonitorSelectorNilUsesHelmValues: false
probeSelectorNilUsesHelmValues: false
ruleSelectorNilUsesHelmValues: false
serviceMonitorSelectorNilUsesHelmValues: false
I've upgraded the chart to v57.0.2 (latest) and it's been working fine for some time now.