troubleshoot
troubleshoot copied to clipboard
Collector for PVC disk usage
Describe the rationale for the suggested feature.
Troubleshoot collects PVC specs but not disk usage.
Describe the feature
K8s users can use a script like kubedf available here which calls the /api/v1/nodes API and collect capacity bytes, available bytes, and percent used. This algorithm would port cleanly to go for implementation as a collector, maybe call it "pvcDiskUsage"?
I imagine it would take optional namespace and optional pvc name (default=all). Note that not everyone knows all their PVC names ahead of time, sometimes they're dynamically created.
Describe alternatives you've considered
- This can be done sometimes using the exec collector to shell into a pod which mounts the volume and run a
dfin that pod. However, pods which are "from scratch", et al, do not containdfso that's not always an option. - We could assemble a custom image containing
kubedf,jq, andkubectland run that withrunPod. It would be better if it was builtin to troubleshoot. - I looked for a metrics API that would let the
httpcollector pull it. That would be ideal also, but I didn't see one. - We could scrape this ourself at the app level and log it. Again, this is something many people want, probably and would be better if not app level.
Additional context
Our users create PVCs dynamically and when they fill up, it's a source of errors. A support bundle containing utilization metrics would be ideal.