kube-state-metrics
kube-state-metrics copied to clipboard
Export pod ephemeral PVCs metrics
What would you like to be added: kube-state-metrics exposes metrics about PVC usage by pods through metrics like kube_pod_spec_volumes_persistentvolumeclaims_info and kube_pod_spec_volumes_persistentvolumeclaims_readonly. I'd like similar metrics to be available for Ephemeral Volumes mounts since those are also backed by PVCs.
Why is this needed: We use prometheus metrics to determine whether a PVC is not mounted, giving us a reminder to drop it it was left behind for some reason. Our alerting rule lists PVCs in a namespace with kube_persistentvolumeclaim_info and excludes mounted ones with kube_pod_spec_volumes_persistentvolumeclaims_info. Ephemeral volumes generate a PVC which appears in kube_persistentvolumeclaim_info but not in kube_pod_spec_volumes_persistentvolumeclaims_info since the volume does not have PersistentVolumeClaim.ClaimName defined. Adding a metric exposing ephemeral PVCs would give us a way to avoid false alarms when a pod is using an ephemeral PVC.
Describe the solution you'd like: Exposing another metric kube_pod_spec_volumes_ephemeral_persistentvolumeclaims_info seems acceptable, or updating kube_pod_spec_volumes_persistentvolumeclaims_info to add a ephemeral label would work as well.
Implementation note: while the PodSpec does not have a field explicitly giving the PVC name, the docs clarify how it's derived from the pod and volume name:
Naming of the automatically created PVCs is deterministic: the name is a combination of the Pod name and volume name, with a hyphen (-) in the middle.
Alternatively, exposing PVC ownership data (ownerReferences metadata) would also address my use case, although I think it would be hard to integrate to my alerting rule.
Additional context We sometimes run temporary workloads that need to store large amounts of data. Since we don't need the data to persist across pod executions, we use Ephemeral Volumes to ensure the PVC is removed when we drop the pod.
Here's a pod manifest example (we use these pods to perform operations on our databases by exec-ing into them, this avoids tunneling and guards against connection drops):
apiVersion: v1
kind: Pod
metadata:
labels:
run: tmp-workload
name: tmp-workload
spec:
terminationGracePeriodSeconds: 3
containers:
- args:
- bash
- -c
- sleep infinity
image: postgres
name: tmp-workload
volumeMounts:
- name: workdir
mountPath: /workdir
resources:
limits:
memory: 1Gi
cpu: "1"
volumes:
- name: workdir
ephemeral:
volumeClaimTemplate:
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 1Ti
/assign @dgrisonnet /triage accepted
This issue has not been updated in over 1 year, and should be re-triaged.
You can:
- Confirm that this issue is still relevant with
/triage accepted(org members only) - Close this issue with
/close
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/
/remove-triage accepted
I don't think this has been addressed
/triage accepted
@TPXP: The label triage/accepted cannot be applied. Only GitHub organization members can add the label.
In response to this:
I don't think this has been addressed
/triage accepted
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.