prometheus-operator
prometheus-operator copied to clipboard
Monitoring Kubernetes PersistentVolumes
Sice 1.12 Kubernetes team remove many core metrics from kubelet i.e. PV metric was dropped. Info. Does someone have a idea that would be the best choice to monitoring PV usage? Below I describe my workaround but unfortunately, this workaround give quite high permission to node_exporter container :D
What did you do?
Switch node_exporer rootfs to:
"--path.rootfs=/rootfs"
Mount kubelet disk plugin in my case ceph.rook.io:
"volumeMounts": [
{
"name": "proc",
"readOnly": true,
"mountPath": "/host/proc"
},
{
"name": "sys",
"readOnly": true,
"mountPath": "/host/sys"
},
{
"name": "rootfs",
"readOnly": true,
"mountPath": "/rootfs/var/lib/kubelet/plugins/ceph.rook.io/rook-ceph-system/mounts/"
}
And add volumes:
"volumes": [
{
"name": "proc",
"hostPath": {
"path": "/proc",
"type": ""
}
},
{
"name": "sys",
"hostPath": {
"path": "/sys",
"type": ""
}
},
{
"name": "rootfs",
"hostPath": {
"path": "/var/lib/kubelet/plugins/ceph.rook.io/rook-ceph-system/mounts/",
"type": ""
}
}
],
This options would reduce number of path mounted by node exporter.
Next I need to run pod as a rook as below or set privileged: true to node_exporter container.
"securityContext": {
"runAsUser": 0,
"runAsNonRoot": false
},
"securityContext": {
"privileged": true
},
Unfortunately both solutions (privileged or root) are not a nice solution and I put it only as temporary workaround to get pv usage stat from node. Mounting only kubelet storage plugin folder a little increase security. In my perspective this is not enough because node_exporter still have full right to all persistent data....
What did you expect to see?
Possibility to get Persistent volume usage without full right to node_exporter.
Environment
-
Prometheus Operator version:
0.17
-
Kubernetes version information:
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.4", GitCommit:"f49fa022dbe63faafd0da106ef7e05a29721d3f1", GitTreeState:"clean", BuildDate:"2018-12-14T06:59:37Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
- Kubernetes cluster kind:
vanila kuberenetes via kubespray(kubeadm)
Yes this is very unfortunate. Unfortunately I don't have a good answer. I've heard of people putting node-exporter into their Pods as sidecars to monitor the mounted volumes. This is something that you should take to sig-storage in Kubernetes as they should make sure these metrics are available.
@kaarolch can you clarify, where you read that a bunch of metrics were removed in 1.12? The issue you linked states that "1.12, the kubelet exposes a number of sources for metrics" and outlines a plan of removing some if them in future versions. I was not able to find any evidence that anything was removed in 1.12.
Persistence volume monitoring is indeed broken since kubelet_volume_stats_* are no longer in prometheus, I am however not convinced that it's kubernets fault as such.
@andrewsav-datacom hmm but in link form my post there is summary:
Current kubelet metrics that are not included in core metrics
- Pod and Node-level Network Metrics
- Persistent Volume Metrics
So As I understand correctly the PV metric are no longer included in core metrics and probably would be move to the csi-storage.
I read this as "desired future state".
I think it's not as much of a removal than these metrics simply not being present/possible with CSI. Previously (as in before CSI) the kubelet managed mounting/preparing/managing volumes, which allowed it to consistently expose metrics about any volume it mounts. Now that the kubelet doesn't do this, it simply can't expose the metrics either.
Hi, There's an update about this issue? there's any workaround can someone suggest? Currently the only metrics about Persistent storage available in Prometheus for me are kube_persistentvolume*
It seems there's some progress in this area, but we're not involved directly: https://github.com/kubernetes/kubernetes/pull/76188
cc @gnufied @msau42
Are you seeing issues with non-CSI volumes or CSI volumes? Capacity usage for non-CSI volumes should work, and CSI volumes is being fixed in kubernetes/kubernetes#76188
This issue has been automatically marked as stale because it has not had any activity in last 60d. Thank you for your contributions.
Here's a doc on using the side-car container method but with no need for special privileges. Tested in OpenShift 3.11 (Kubernetes 1.11.0):
https://access.redhat.com/solutions/4406661
This issue has been automatically marked as stale because it has not had any activity in last 60d. Thank you for your contributions.
Anyone have a status of this issue in 2020? It would be nice to have these metrics reported to us and without having to deploy a sidecar to every pod. I am using CSI volumes with rook-ceph. Thanks!
I think that would be best answered by sig storage people on Kubernetes. I don’t know off the top of my head.
Feels like the Spiderman meme with Prometheus operator, rook-ceph, and sig storage team pointing at each other. 😄 I'll still continue to dig into this issue.
This query worked for me and yields a percentage:
kubelet_volume_stats_capacity_bytes{persistentvolumeclaim="$volume"} - kubelet_volume_stats_available_bytes{persistentvolumeclaim="$volume"}) / kubelet_volume_stats_capacity_bytes{persistentvolumeclaim="$volume"} * 100
@mdgreenwald That doesn't help if you're unable to get any kubelet_volume_* metrics gathered.
This issue has been automatically marked as stale because it has not had any activity in last 60d. Thank you for your contributions.
Come across this problem and face it as-well..
I also have this problem
Is there any workarounds for now instead of having to deploy this sidecar to every pod?
Can write script detection @0xMH
Sorry, re-reading this thread. from sig-storage perspective and as far as I know, persistent volume metrics are still being reported by kubelet. For in-tree volume types, this should already work. For any CSI volume type - if they implement NodeGetVolumeStats RPC call, then PV metrics should be available from the kubelet.
You may notice that, these metrics are tied to lifecycle of a pod on a node. That is they are only reported when a volume is mounted/in-use on the node and that is expected behaviour.
If you notice that, these metrics are missing for a particular driver/volume type. It might be most likely driver bug. If you think driver is alright, please open a bug against kubernetes/kubernetes and we will do our best to address it.
any good news about this issue? because am having the same problem.
We have the same problem on k8s 1.17.4 and vmware csi driver v1.1.0 After update vmware csi driver to v2.0.1 metric are present on kubelet. I think you should find support this metric (or GetNodeVolumeStats) in you PV driver. https://github.com/kubernetes-sigs/vsphere-csi-driver/pull/108/files
Same problem with the cinder csi driver on managed Kubernetes instance at OVHCloud. https://github.com/kubernetes/cloud-provider-openstack/issues/1064
Same problem with the cinder csi driver on managed Kubernetes instance at OVHCloud. kubernetes/cloud-provider-openstack#1064
I used curl -k https://localhost:6443/api/v1/nodes/127.0.0.1/proxy/metrics and https://github.com/kubernetes/cloud-provider-openstack/blob/master/pkg/csi/cinder/nodeserver.go#L466 support already implemented, but
# curl -k https://localhost:6443/api/v1/nodes/127.0.0.1/proxy/metrics | grep kubelet_vo
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 107k 0 107k 0 0 2917k 0 --:--:-- --:--:-- --:--:-- 2917k
see nothing... any comments additional stuff need to be added?
similar case as aws efs csi which has implemented NodeGetVolumeStats rcp call, but no any exposing metrics with efs pv usage in prometheus, even can not find efs pv.
Yes having the same issue for AWS EFS CSI, it's working fine for the EBS