mimir
mimir copied to clipboard
Mimir mixin Disk space utilization panels broken for mimir helm chart
Describe the bug
The disk space utilization panels in the mimir mixin don't work with mimir deployed using the mimir-distributed helm chart.
Query example from:
max by(persistentvolumeclaim) (
kubelet_volume_stats_used_bytes{cluster=~"$cluster", namespace=~"$namespace"} /
kubelet_volume_stats_capacity_bytes{cluster=~"$cluster", namespace=~"$namespace"}
)
and
count by(persistentvolumeclaim) (
kube_persistentvolumeclaim_labels{
cluster=~"$cluster", namespace=~"$namespace",
label_name=~"(ingester).*"
}
)
Problematic part is label_name=~"(ingester).*" on the kube_persistentvolumeclaim_labels metric coming from.
There are 2 issues with it;
- If you the kube-state-metrics service monitor from metamonitoring it will drop the
kube_persistentvolumeclaim_labelsmetric due to this metric relabeling rule - The PVCs produced don't have this label:
labels:
app.kubernetes.io/component: ingester
app.kubernetes.io/instance: metrics
app.kubernetes.io/name: mimir
rollout-group: ingester
zone: zone-a
To Reproduce
Steps to reproduce the behavior:
- Start mimir distributed helm chart
- Deploy the mimir mixin
Expected behavior
For Disk space utilization panels to work.
Environment
- Infrastructure: Kubernetes, AKS
- Deployment tool: Helm
Additional Context
Not sure what would be the best way to fix this. Things that come to mind
- Update the service monitor to not drop more used metrics by the mixin
- Allow configuring which labels is used instead of
label_namein those panels
Willing to submit a PR to address this after feedback.
Keeping the kube_persistentvolumeclaim_labels metric just for the pods from mimir looks tricky. Can we use the 'app.kubernetess.io/name: mimir' inside the relabelling config to pick this up? To avoid scraping other mimir clusters in the same k8s cluster we can add another 'keep' relabelling which checks the namespace label (and need to make sure it's present on all currently collected metrics)
Regarding the second problem of using different labels - is it possible to use label_replace in the panel query so we take either 'label_rollout_group' or 'label_name' - whichever is present? I.e. replace label_name woth the other only if the other is non-empty, maybe theres something with regex we can do?
@dimitarvdimitrov I'm just facing this issue and would using the app.kubernetes.io/component: ingester label not be a better choice than rollout group for the query? I can work on that as I really need this to be fixed
using app.kubernetess.io/name: mimir makes this slightly more reusable - we don't have to update the scraping rules every time a new component has disk (or we rename the component or add disk to an existing component, etc). If that's not possible, then app.kubernetes.io/component should also suffice
I think the issue with using app.kubernetess.io/name is that it would not work on specific write dashboard because it would write thé data of all Mimir components instead of just the ingester
See PR here https://github.com/grafana/mimir/pull/7968
Let me know how I can speed things up :)
Using a common label on the kube_persistentvolumeclaim_labels selector
My suggestion in the issue was to add support for both labels in panel via label_replace or label_join promQL function. But I realized this won't work because we cannot filter on the series labels outside of the vector selector (kube_persistentvolumeclaim_labels{...})
Another option for solving the selector problem is to add the standard kubernetes labels (app.kubernetes.io/*) to the jsonnet mixin and use that in the dashboards like the ones here
https://github.com/grafana/mimir/blob/f573a03208fb076f8d65375c72572528e343912f/operations/helm/charts/mimir-distributed/templates/_helpers.tpl#L226-L237
@dimitarvdimitrov I'm not sure what you mean by adding the label to the jsonnet mixin, I'm fine implementing a working solution though :D
The main issue with using the kube_persistentvolumeclaim_labels is that the newer versions of kube-state-metrics do not expose any labels by defaults (since 2.11 I think) and they need to be explicitely asked for
adding the label to the jsonnet mixin
I meant adding the label to the resources created by the jsonnet library to deploy Mimir. This will help with having a single label selector in the promQL query because it will match both jsonnet and helm deployments because they will have some label in common
The main issue with using the
kube_persistentvolumeclaim_labelsis that the newer versions of kube-state-metrics do not expose any labels by defaults (since 2.11 I think) and they need to be explicitely asked for
That doesn't sound great. So the metric is just empty by default? I couldn't find this change in the changelog. Do you have a link to the PR or changelog entry?
I was wrong, this happened in kube-state-metrics 2.10 https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.10.0 (cf. the top message)
Regarding the fix I would assume this would be needed under operations/mimir? In that case I'm not sure I would be the best to fix it because I'm definitely lost in this folder
@dimitarvdimitrov what do you think about using something like
kubelet_volume_stats_used_bytes{cluster_id=~"$cluster", namespace=~"$namespace", persistentvolumeclaim=~".*(ingester)-.*"} / \n kubelet_volume_stats_capacity_bytes{cluster_id=~"$cluster", namespace=~"$namespace", persistentvolumeclaim=~".*(ingester)-.*"}\n)\n"
instead of relying on the kube-state-metrics labels metric?