kube-state-metrics icon indicating copy to clipboard operation
kube-state-metrics copied to clipboard

Export pod ephemeral PVCs metrics

Open TPXP opened this issue 1 year ago • 5 comments

What would you like to be added: kube-state-metrics exposes metrics about PVC usage by pods through metrics like kube_pod_spec_volumes_persistentvolumeclaims_info and kube_pod_spec_volumes_persistentvolumeclaims_readonly. I'd like similar metrics to be available for Ephemeral Volumes mounts since those are also backed by PVCs.

Why is this needed: We use prometheus metrics to determine whether a PVC is not mounted, giving us a reminder to drop it it was left behind for some reason. Our alerting rule lists PVCs in a namespace with kube_persistentvolumeclaim_info and excludes mounted ones with kube_pod_spec_volumes_persistentvolumeclaims_info. Ephemeral volumes generate a PVC which appears in kube_persistentvolumeclaim_info but not in kube_pod_spec_volumes_persistentvolumeclaims_info since the volume does not have PersistentVolumeClaim.ClaimName defined. Adding a metric exposing ephemeral PVCs would give us a way to avoid false alarms when a pod is using an ephemeral PVC.

Describe the solution you'd like: Exposing another metric kube_pod_spec_volumes_ephemeral_persistentvolumeclaims_info seems acceptable, or updating kube_pod_spec_volumes_persistentvolumeclaims_info to add a ephemeral label would work as well.

Implementation note: while the PodSpec does not have a field explicitly giving the PVC name, the docs clarify how it's derived from the pod and volume name:

Naming of the automatically created PVCs is deterministic: the name is a combination of the Pod name and volume name, with a hyphen (-) in the middle.

Alternatively, exposing PVC ownership data (ownerReferences metadata) would also address my use case, although I think it would be hard to integrate to my alerting rule.

Additional context We sometimes run temporary workloads that need to store large amounts of data. Since we don't need the data to persist across pod executions, we use Ephemeral Volumes to ensure the PVC is removed when we drop the pod.

Here's a pod manifest example (we use these pods to perform operations on our databases by exec-ing into them, this avoids tunneling and guards against connection drops):

apiVersion: v1
kind: Pod
metadata:
  labels:
    run: tmp-workload
  name: tmp-workload
spec:
  terminationGracePeriodSeconds: 3
  containers:
  - args:
    - bash
    - -c
    - sleep infinity
    image: postgres
    name: tmp-workload
    volumeMounts:
    - name: workdir
      mountPath: /workdir
    resources:
      limits:
        memory: 1Gi
        cpu: "1"
  volumes:
  - name: workdir
    ephemeral:
      volumeClaimTemplate:
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 1Ti

TPXP avatar Aug 30 '24 14:08 TPXP

/assign @dgrisonnet /triage accepted

dashpole avatar Sep 05 '24 16:09 dashpole

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

k8s-triage-robot avatar Sep 05 '25 17:09 k8s-triage-robot

I don't think this has been addressed

/triage accepted

TPXP avatar Sep 09 '25 09:09 TPXP

@TPXP: The label triage/accepted cannot be applied. Only GitHub organization members can add the label.

In response to this:

I don't think this has been addressed

/triage accepted

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Sep 09 '25 09:09 k8s-ci-robot