troubleshoot icon indicating copy to clipboard operation
troubleshoot copied to clipboard

Collector for PVC disk usage

Open mnp opened this issue 1 year ago • 4 comments

Describe the rationale for the suggested feature.

Troubleshoot collects PVC specs but not disk usage.

Describe the feature

K8s users can use a script like kubedf available here which calls the /api/v1/nodes API and collect capacity bytes, available bytes, and percent used. This algorithm would port cleanly to go for implementation as a collector, maybe call it "pvcDiskUsage"?

I imagine it would take optional namespace and optional pvc name (default=all). Note that not everyone knows all their PVC names ahead of time, sometimes they're dynamically created.

Describe alternatives you've considered

  • This can be done sometimes using the exec collector to shell into a pod which mounts the volume and run a df in that pod. However, pods which are "from scratch", et al, do not contain df so that's not always an option.
  • We could assemble a custom image containing kubedf, jq, and kubectl and run that with runPod. It would be better if it was builtin to troubleshoot.
  • I looked for a metrics API that would let the http collector pull it. That would be ideal also, but I didn't see one.
  • We could scrape this ourself at the app level and log it. Again, this is something many people want, probably and would be better if not app level.

Additional context

Our users create PVCs dynamically and when they fill up, it's a source of errors. A support bundle containing utilization metrics would be ideal.

mnp avatar Mar 01 '24 23:03 mnp