lvm-localpv
lvm-localpv copied to clipboard
expose pvc/pod level metrics (if not already exposed by kubelet and cadvisor)
Required k8s persistent volume & filesystem level metrics along with their grafana dashboards and few sane alerts preferably in kube-prometheus mixin format. I believe many of them are already exposed by kubelet (or embedded cadvisor), but we need to check and expose them for cgroup v2 hierarchy as well.
- Utilisation metrics (both inodes and bytes usage) along with total space available. - Already being exposed by kubelet. Grafana dashboards and prometheus alerts are already provided by kube-prometheus stack.
- Volume read and write throughput metrics both in terms of iops and bytes per second. - These seems to be exposed by cadvisor, but somehow not visible for cgroup v2 hierarchy.
- Disk read & write IO latency - Need to check if cadvisor already exposes these for cgroup v2.
- No. of outstanding IO operations (preferably both queued as well as waiting for block device).
- PV abnormality metrics due to degrading of underlying disk attached to node, fs corruption, accidental volume deletion on node etc. See if we can leverage volume health monitoring for the same.
Additionally we require following metrics related to pvc failure & provisioning to generate appropriate alerts.
- pvc pending from long time. Explore if we can leverage kube-state-metrics to expose the same. Or we need to see if external provisioner already provide these metrics.
- Other plugin level metrics (both controller and node driver) like client-go metrics, creation/expansion/deletion rpc rates, latency & failures.
Environment:
- Kubernetes version (use
kubectl version
): >= 1.19 - OS (e.g. from
/etc/os-release
): Debian 10
Most of the metrics are available via:
- kube-state-metrics
- cAdvisor
- Node exporter (standard and include kubelet mount point metrics )
In addition to the above, the LVM node-plugin will expose metrics (in addition to what exposed by sample LVM textfile exporter) with required labels attached to the metrics to co-relate with metrics exposed via standard exporters enabled in the cluster.
Sample dashboard with workload using LVM Local PV showing the PV utilization and performance metrics
Thanks @kmova. Let me know as the dashboard gets ready & pushed somewhere. I would like to try them out in our playground clusters.
Need to verify the metrics. Previous comments mention that metrics are available. @abhilashshetty04 Could you please check this.