helm-charts
helm-charts copied to clipboard
Duplicate data in Grafana dashboards
We are using victoria-metrics-k8s-stack
helm chart and we have those settings in place
#################################################
### Service Monitors #####
#################################################
## Component scraping the kubelets
kubelet:
enabled: true
# -- Enable scraping /metrics/cadvisor from kubelet's service
cadvisor: true
# -- Enable scraping /metrics/probes from kubelet's service
probes: true
# spec for VMNodeScrape crd
# https://docs.victoriametrics.com/operator/api.html#vmnodescrapespec
spec:
scheme: "https"
honorLabels: true
interval: "30s"
scrapeTimeout: "5s"
tlsConfig:
insecureSkipVerify: true
caFile: "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt"
bearerTokenFile: "/var/run/secrets/kubernetes.io/serviceaccount/token"
# drop high cardinality label and useless metrics for cadvisor and kubelet
metricRelabelConfigs:
- action: labeldrop
regex: (uid)
- action: labeldrop
regex: (id|name)
- action: drop
source_labels: [__name__]
regex: (rest_client_request_duration_seconds_bucket|rest_client_request_duration_seconds_sum|rest_client_request_duration_seconds_count)
relabelConfigs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- sourceLabels: [__metrics_path__]
targetLabel: metrics_path
- targetLabel: "job"
replacement: "kubelet"
# ignore timestamps of cadvisor's metrics by default
# more info here https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4697#issuecomment-1656540535
honorTimestamps: false
Since we enabled cadvisor
we have duplicated metrics.
- machine_cpu_cores
- machine_memory_bytes
- Potentially more
This cause wrong graphs in some dashboards that does calculations with those metrics.
If you looke at this dashboard for example charts/victoria-metrics-k8s-stack/templates/grafana/dashboards/k8s-views-global.yaml
you can find this expression:
"expr": "sum(kube_pod_container_resource_requests{resource=\"cpu\"}) / sum(machine_cpu_cores)",
To fix this I suggest adding job label so it will looks like this
"expr": "sum(kube_pod_container_resource_requests{resource=\"cpu\"}) / sum(machine_cpu_cores{job=\"kubelet\"})",
Also I noticed that you removed job label node-exporter
which will also cause issues incase we have different job collecting metrics from outside Kubernetes with the same metric name