helm-charts
helm-charts copied to clipboard
Grafana Default Dashboards - No Data
I've deployed a VictoriaMetrics Cluster using the operator and have this working fine.
I've just deployed the grafana dashboards using the following values file to generate the dashboards using the k8s-stack helm chart and output them to a file using dry run and the following command.
helm install victoria-metrics-k8s-stack vm/victoria-metrics-k8s-stack -f values.yaml -n cluster-monitoring --dry-run > /tmp/vm-k8s.yaml
values.yaml
grafana:
enabled: true
sidecar:
datasources:
enabled: true
createVMReplicasDatasources: false
dashboards:
enabled: true
multicluster: true
additionalDataSources: []
dashboardProviders:
dashboardproviders.yaml:
apiVersion: 1
providers:
- name: 'default'
orgId: 1
folder: ''
type: file
disableDeletion: false
editable: true
options:
path: /var/lib/grafana/dashboards/default
dashboards:
default:
victoriametrics:
url: https://raw.githubusercontent.com/VictoriaMetrics/VictoriaMetrics/master/dashboards/victoriametrics.json
vmagent:
url: https://raw.githubusercontent.com/VictoriaMetrics/VictoriaMetrics/master/dashboards/vmagent.json
nodeexporter:
gnetId: 1860
revision: 22
datasource: VictoriaMetrics
defaultDashboardsEnabled: true
This generates the dashboards fine and allows them to be loaded in to grafana. The problem is some of the metrics don't show any data unless the metric query is modified.
An example is Cluster Memory Utilisation on the Kubernetes / Computer Resources / Cluster dashboard:
With the metrics query as generated:
1 - sum(:node_memory_MemAvailable_bytes:sum{cluster="$cluster"}) / sum(node_memory_MemTotal_bytes{cluster="$cluster"})
The result is:
When the metrics query is changed to:
1 - sum(node_memory_MemAvailable_bytes{cluster="$cluster"}) / sum(node_memory_MemTotal_bytes{cluster="$cluster"})
Then the result is
Which seems accurate for the current use on our monitoring cluster.
There are many dashboards that have the same problem where the query syntax needs changing, Is this something specific to the VictoriaMetrics chart and datasource or is this an upstream issue with the kube-prometheus dashboards and how they are sync'd using sync_grafana_dashboards.py
Hello, many default kubernetes dashboards depends on recording rules
, which executed and ingested to the storage by VMAlert
.
So, for cluster case, you have to edit vmalert configuration. For genereated config its located at output of victoria-metrics-k8s-stack/templates/victoria-metrics-operator/vmalert.yaml
, change vmsingle to vmcluster - select and insert nodes.
It should fix this issue.
Btw, as far as i know, soon k8s-stack will support cluster version.
Thanks @f41gh7 i'd overlooked the VMRule and VMAlert configs when copying them to our kustomize base assuming they were linked to AlertManager which I wasn't ready to setup. Now i've added VMAlert, VMAlertmanager and all the VMRule it's all working a lot better.
Just another VMrule to track down to fix the CPU/Memory Requests on the cluster dashboard and should be good to go.
Looks like i have the correct rule loaded just nothing recorded against it yet.
Rule CRD
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMRule
metadata:
namespace: victoria-metrics
name: library-systems-monitoring-k8s
spec:
groups:
- name: k8s.rules
rules:
- expr: |-
sum by (cluster, namespace, pod, container) (
rate(container_cpu_usage_seconds_total{job="kubelet", metrics_path="/metrics/cadvisor", image!=""}[5m])
) * on (cluster, namespace, pod) group_left(node) topk by (cluster, namespace, pod) (
1, max by(cluster, namespace, pod, node) (kube_pod_info{node!=""})
)
record: node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate
- expr: |-
container_memory_working_set_bytes{job="kubelet", metrics_path="/metrics/cadvisor", image!=""}
* on (namespace, pod) group_left(node) topk by(namespace, pod) (1,
max by(namespace, pod, node) (kube_pod_info{node!=""})
)
record: node_namespace_pod_container:container_memory_working_set_bytes
- expr: |-
container_memory_rss{job="kubelet", metrics_path="/metrics/cadvisor", image!=""}
* on (namespace, pod) group_left(node) topk by(namespace, pod) (1,
max by(namespace, pod, node) (kube_pod_info{node!=""})
)
record: node_namespace_pod_container:container_memory_rss
- expr: |-
container_memory_cache{job="kubelet", metrics_path="/metrics/cadvisor", image!=""}
* on (namespace, pod) group_left(node) topk by(namespace, pod) (1,
max by(namespace, pod, node) (kube_pod_info{node!=""})
)
record: node_namespace_pod_container:container_memory_cache
- expr: |-
container_memory_swap{job="kubelet", metrics_path="/metrics/cadvisor", image!=""}
* on (namespace, pod) group_left(node) topk by(namespace, pod) (1,
max by(namespace, pod, node) (kube_pod_info{node!=""})
)
record: node_namespace_pod_container:container_memory_swap
- expr: |-
sum by (namespace, cluster) (
sum by (namespace, pod, cluster) (
max by (namespace, pod, container, cluster) (
kube_pod_container_resource_requests{resource="memory",job="kube-state-metrics"}
) * on(namespace, pod, cluster) group_left() max by (namespace, pod) (
kube_pod_status_phase{phase=~"Pending|Running"} == 1
)
)
)
record: namespace_memory:kube_pod_container_resource_requests:sum
- expr: |-
sum by (namespace, cluster) (
sum by (namespace, pod, cluster) (
max by (namespace, pod, container, cluster) (
kube_pod_container_resource_requests{resource="cpu",job="kube-state-metrics"}
) * on(namespace, pod, cluster) group_left() max by (namespace, pod) (
kube_pod_status_phase{phase=~"Pending|Running"} == 1
)
)
)
record: namespace_cpu:kube_pod_container_resource_requests:sum
- expr: |-
max by (cluster, namespace, workload, pod) (
label_replace(
label_replace(
kube_pod_owner{job="kube-state-metrics", owner_kind="ReplicaSet"},
"replicaset", "$1", "owner_name", "(.*)"
) * on(replicaset, namespace) group_left(owner_name) topk by(replicaset, namespace) (
1, max by (replicaset, namespace, owner_name) (
kube_replicaset_owner{job="kube-state-metrics"}
)
),
"workload", "$1", "owner_name", "(.*)"
)
)
labels:
workload_type: deployment
record: namespace_workload_pod:kube_pod_owner:relabel
- expr: |-
max by (cluster, namespace, workload, pod) (
label_replace(
kube_pod_owner{job="kube-state-metrics", owner_kind="DaemonSet"},
"workload", "$1", "owner_name", "(.*)"
)
)
labels:
workload_type: daemonset
record: namespace_workload_pod:kube_pod_owner:relabel
- expr: |-
max by (cluster, namespace, workload, pod) (
label_replace(
kube_pod_owner{job="kube-state-metrics", owner_kind="StatefulSet"},
"workload", "$1", "owner_name", "(.*)"
)
)
labels:
workload_type: statefulset
record: namespace_workload_pod:kube_pod_owner:relabel
This is an issue with the upstream recording rules - I've submitted a PR patch for this which should resolve the issue - https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/641