aws-ebs-csi-driver
aws-ebs-csi-driver copied to clipboard
[Feature Request] Volume Autoscaling (Resize EBS Disk Size)
Is your feature request related to a problem? Please describe. AWS EBS volumes support increasing disk size, but this must be done manually. Currently, this process is not automated. When the final storage requirements of an application are uncertain, it's preferable to allocate less storage initially and scale up as needed.
Describe the solution you'd like in detail There are existing Kubelet metrics that can be used to calculate disk usage:
(kubelet_volume_stats_capacity_bytes - kubelet_volume_stats_available_bytes) / kubelet_volume_stats_capacity_bytes
These metrics can be retrieved from the kube-prometheus-stack by providing a Prometheus URL or directly from Kubelet using ebs-csi-node DaemonSet. Based on these metrics, storage autoscaling can be implemented. For example, when storage usage exceeds 75% (a configurable threshold), the CSI driver should automatically increase the disk size by 20%.
Describe alternatives you've considered Create an alert for PV disk usage and manually update the PVC.
Additional context In dynamic cloud environments, supporting storage autoscaling would help optimize resource allocation and reduce costs by avoiding unnecessary over-provisioning.
/retitle [Feature Request] Volume Autoscaling (Resize EBS Disk Size) /kind feature
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
I came across this while looking up similar use cases: https://repost.aws/questions/QUf-VrsN50QkCTSiy97NHARg/need-help-automating-ebs-volume-scaling-at-80-capacity, that lets you alarm on your volume utilization then automates scaling using Elastic Volume EV. Would this solve your use case? Any challenges it presents?
Can you share more about your use case and scaling pattern?
While this can be used to resize volumes, it requires using multiple additional services, permissions, and installing the CloudWatch Agent (since disk usage metrics are not available in default monitoring). Meanwhile, the EBS CSI driver already runs on every node and is capable of resizing disks and filesystems on its own.
Regarding scaling: we deployed an application that requires a persistent volume, but we don't know the exact volume size in advance. So, we provision a smaller volume and resize it multiple times until the required storage size is known.
It makes sense to have everything in one place. The metric you listed in the original post is vended by EBS CSI driver and you should be able to scrape those via a Prometheus endpoint to set up alarms without specifically requiring CloudWatch.
For scaling, will you know the estimated max volume size in advance? For automated scaling, I imagine you will still need an upper bound to avoid incurring cost due to run-away scripts consuming excessive storage by mistake. What type of application are you deploying?
We have configured an alert for this metric, which triggers when the used disk exceeds 80% of the total capacity. However, resizing still requires a manual increase of PVC disk requests.
Currently, we are using PersistentVolumes for Prometheus (metrics storage) and VictoriaLogs (logs storage). As incoming traffic increases to the newly created cluster, there is a growing demand for larger disk sizes.
At the moment, we do not have an upper bound limit. The same metric can also be used to detect underutilized volumes with excessive free space.
It would be nice if the upper bound and disk increase percentage could be configured via annotations, and the EBS CSI driver could use these settings to apply automatic storage scaling.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale