prometheus-operator Monitoring Kubernetes PersistentVolumes

Sice 1.12 Kubernetes team remove many core metrics from kubelet i.e. PV metric was dropped. Info. Does someone have a idea that would be the best choice to monitoring PV usage? Below I describe my workaround but unfortunately, this workaround give quite high permission to node_exporter container :D
What did you do? Switch node_exporer rootfs to:

"--path.rootfs=/rootfs"

Mount kubelet disk plugin in my case ceph.rook.io:

"volumeMounts": [
              {
                "name": "proc",
                "readOnly": true,
                "mountPath": "/host/proc"
              },
              {
                "name": "sys",
                "readOnly": true,
                "mountPath": "/host/sys"
              },
              {
                "name": "rootfs",
                "readOnly": true,
                "mountPath": "/rootfs/var/lib/kubelet/plugins/ceph.rook.io/rook-ceph-system/mounts/"
              }

And add volumes:

"volumes": [
          {
            "name": "proc",
            "hostPath": {
              "path": "/proc",
              "type": ""
            }
          },
          {
            "name": "sys",
            "hostPath": {
              "path": "/sys",
              "type": ""
            }
          },
          {
            "name": "rootfs",
            "hostPath": {
              "path": "/var/lib/kubelet/plugins/ceph.rook.io/rook-ceph-system/mounts/",
              "type": ""
            }
          }
        ],

This options would reduce number of path mounted by node exporter. Next I need to run pod as a rook as below or set privileged: true to node_exporter container.

"securityContext": {
          "runAsUser": 0,
          "runAsNonRoot": false
        },

"securityContext": {
          "privileged": true
        },

Unfortunately both solutions (privileged or root) are not a nice solution and I put it only as temporary workaround to get pv usage stat from node. Mounting only kubelet storage plugin folder a little increase security. In my perspective this is not enough because node_exporter still have full right to all persistent data.... What did you expect to see? Possibility to get Persistent volume usage without full right to node_exporter.

Environment

Prometheus Operator version:

0.17
Kubernetes version information:

Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.4", GitCommit:"f49fa022dbe63faafd0da106ef7e05a29721d3f1", GitTreeState:"clean", BuildDate:"2018-12-14T06:59:37Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}

Kubernetes cluster kind:

vanila kuberenetes via kubespray(kubeadm)

Jan 30 '19 07:01 kaarolch

Yes this is very unfortunate. Unfortunately I don't have a good answer. I've heard of people putting node-exporter into their Pods as sidecars to monitor the mounted volumes. This is something that you should take to sig-storage in Kubernetes as they should make sure these metrics are available.

Feb 04 '19 19:02 brancz

@kaarolch can you clarify, where you read that a bunch of metrics were removed in 1.12? The issue you linked states that "1.12, the kubelet exposes a number of sources for metrics" and outlines a plan of removing some if them in future versions. I was not able to find any evidence that anything was removed in 1.12.

Persistence volume monitoring is indeed broken since kubelet_volume_stats_* are no longer in prometheus, I am however not convinced that it's kubernets fault as such.

Feb 13 '19 05:02 andrewsav-bt

@andrewsav-datacom hmm but in link form my post there is summary:

Current kubelet metrics that are not included in core metrics

Pod and Node-level Network Metrics

Persistent Volume Metrics

So As I understand correctly the PV metric are no longer included in core metrics and probably would be move to the csi-storage.

Feb 13 '19 06:02 kaarolch

I read this as "desired future state".

Feb 13 '19 06:02 AndrewSav

I think it's not as much of a removal than these metrics simply not being present/possible with CSI. Previously (as in before CSI) the kubelet managed mounting/preparing/managing volumes, which allowed it to consistently expose metrics about any volume it mounts. Now that the kubelet doesn't do this, it simply can't expose the metrics either.

Feb 18 '19 12:02 brancz

Hi, There's an update about this issue? there's any workaround can someone suggest? Currently the only metrics about Persistent storage available in Prometheus for me are kube_persistentvolume*

May 02 '19 12:05 dcardozoo

It seems there's some progress in this area, but we're not involved directly: https://github.com/kubernetes/kubernetes/pull/76188

May 02 '19 12:05 metalmatze

cc @gnufied @msau42

May 15 '19 22:05 jingxu97

Are you seeing issues with non-CSI volumes or CSI volumes? Capacity usage for non-CSI volumes should work, and CSI volumes is being fixed in kubernetes/kubernetes#76188

May 15 '19 22:05 msau42

This issue has been automatically marked as stale because it has not had any activity in last 60d. Thank you for your contributions.

Aug 14 '19 00:08 stale[bot]

Here's a doc on using the side-car container method but with no need for special privileges. Tested in OpenShift 3.11 (Kubernetes 1.11.0):

https://access.redhat.com/solutions/4406661

Sep 10 '19 19:09 bostrt

This issue has been automatically marked as stale because it has not had any activity in last 60d. Thank you for your contributions.

Nov 09 '19 19:11 stale[bot]

Anyone have a status of this issue in 2020? It would be nice to have these metrics reported to us and without having to deploy a sidecar to every pod. I am using CSI volumes with rook-ceph. Thanks!

Feb 07 '20 16:02 onedr0p

I think that would be best answered by sig storage people on Kubernetes. I don’t know off the top of my head.

Feb 10 '20 18:02 brancz

Feels like the Spiderman meme with Prometheus operator, rook-ceph, and sig storage team pointing at each other. 😄 I'll still continue to dig into this issue.

Feb 10 '20 20:02 onedr0p

This query worked for me and yields a percentage:

kubelet_volume_stats_capacity_bytes{persistentvolumeclaim="$volume"} - kubelet_volume_stats_available_bytes{persistentvolumeclaim="$volume"}) / kubelet_volume_stats_capacity_bytes{persistentvolumeclaim="$volume"} * 100

Mar 02 '20 10:03 mdgreenwald

@mdgreenwald That doesn't help if you're unable to get any kubelet_volume_* metrics gathered.

Mar 02 '20 13:03 onedr0p

This issue has been automatically marked as stale because it has not had any activity in last 60d. Thank you for your contributions.

May 01 '20 13:05 stale[bot]

Come across this problem and face it as-well..

Jul 20 '20 17:07 kfirfer

I also have this problem

Jul 24 '20 02:07 lianfulei

Is there any workarounds for now instead of having to deploy this sidecar to every pod?

Jul 27 '20 14:07 0xMH

Can write script detection @0xMH

Jul 28 '20 02:07 lianfulei

Sorry, re-reading this thread. from sig-storage perspective and as far as I know, persistent volume metrics are still being reported by kubelet. For in-tree volume types, this should already work. For any CSI volume type - if they implement NodeGetVolumeStats RPC call, then PV metrics should be available from the kubelet.

You may notice that, these metrics are tied to lifecycle of a pod on a node. That is they are only reported when a volume is mounted/in-use on the node and that is expected behaviour.

Jul 28 '20 02:07 gnufied

If you notice that, these metrics are missing for a particular driver/volume type. It might be most likely driver bug. If you think driver is alright, please open a bug against kubernetes/kubernetes and we will do our best to address it.

Jul 28 '20 02:07 gnufied

any good news about this issue? because am having the same problem.

Aug 09 '20 06:08 Saadwalami

We have the same problem on k8s 1.17.4 and vmware csi driver v1.1.0 After update vmware csi driver to v2.0.1 metric are present on kubelet. I think you should find support this metric (or GetNodeVolumeStats) in you PV driver. https://github.com/kubernetes-sigs/vsphere-csi-driver/pull/108/files

Aug 26 '20 14:08 xander-sh

Same problem with the cinder csi driver on managed Kubernetes instance at OVHCloud. https://github.com/kubernetes/cloud-provider-openstack/issues/1064

Sep 14 '20 08:09 mickours

Same problem with the cinder csi driver on managed Kubernetes instance at OVHCloud. kubernetes/cloud-provider-openstack#1064

I used curl -k https://localhost:6443/api/v1/nodes/127.0.0.1/proxy/metrics and https://github.com/kubernetes/cloud-provider-openstack/blob/master/pkg/csi/cinder/nodeserver.go#L466 support already implemented, but

# curl -k https://localhost:6443/api/v1/nodes/127.0.0.1/proxy/metrics | grep kubelet_vo
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  107k    0  107k    0     0  2917k      0 --:--:-- --:--:-- --:--:-- 2917k

see nothing... any comments additional stuff need to be added?

Sep 25 '20 02:09 jichenjc

similar case as aws efs csi which has implemented NodeGetVolumeStats rcp call, but no any exposing metrics with efs pv usage in prometheus, even can not find efs pv.

May 31 '21 13:05 Davidrjx

Yes having the same issue for AWS EFS CSI, it's working fine for the EBS

Jun 09 '21 05:06 lkravi