aws-ebs-csi-driver Support RAID of multiple EBS volumes

One way to get better performance out of EBS volumes is to rely on RAIDs made of multiple EBS volumes, as described here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/raid-config.html

A common use case for example would be to build a RAID out of two st1 or sc1 volumes, which provides different performance/price characteristics than e.g. using gp3 volumes.

Unfortunately, that's not trivial in Kubernetes, as it natively expects that one mounted volume maps to a single PersistentVolume. It seems technically possible to do the following, for example in a privileged init container:

unmount the 2 EBS volumes that were mounted and formatted by the EBS CSI driver
assemble a RAID with mmap, and format the resulting file system In the main container, the RAID volume can then be mounted through a hostPath.

While it can technically work, it's not enough, as it does not ensure the lifecycle of the resulting volume:

the RAID volume should be unmounted if the Pod is deleted
accessing the same RAID volume from another host should be possible
resizing the RAID volume up

I think this should be better handled by the EBS CSI driver directly, and abstracted away from end-users. Users would for example declare the following storage class:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: sc1-raid
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
parameters:
  csi.storage.k8s.io/fstype: xfs
  type: sc1
  raid:
    type: 0
    volumeCount: 3

and the following PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ebs-claim
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: sc1-raid
  resources:
    requests:
      storage: 300Gi

As a result, the driver would provision 3 sc1 volumes of 100Gi each (PVC storage requests / storage class volume count, rounded up), build the RAID, and create one resulting PersistentVolume that can be bound to that claim. From the user perspective there is a single 300Gi volume, the way the RAID is manipulated is not the user's concern.

Apr 28 '22 08:04 sebgl

/kind feature

Apr 28 '22 15:04 rdpsin

Not sure that AWS would be a fan. EBS is already backed by (distributed) RAID, yet the performance is pretty limited and AWS wants you to pay (a lot) more for better performance. This RAID-of-RAID gives performance for (literally) free despite costing AWS something. I don’t see AWS implementing this feature and making it effortless to avoid paying for performance. But I’d love seeing this in a fork.

May 03 '22 14:05 almson

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Aug 01 '22 14:08 k8s-triage-robot

I don't think this is suitable to implement in the driver:

Internally, the driver is largely built around the idea that 1 kubernetes volume maps to 1 EBS volume - it would be significant effort to overhaul this
This breaks CSINode's Allocatable tracking, so kube-scheduler will have a random chance of scheduling too many volumes on one node and softlocking the cluster
What are the implications for other features such as resizing and snapshots?

The driver supports block volumes, I would suggest just passing in X of those and assembling the RAID in-pod.

Aug 04 '22 19:08 ConnorJC3

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Sep 03 '22 19:09 k8s-triage-robot

Not sure that AWS would be a fan. EBS is already backed by (distributed) RAID, yet the performance is pretty limited and AWS wants you to pay (a lot) more for better performance. This RAID-of-RAID gives performance for (literally) free despite costing AWS something. I don’t see AWS implementing this feature and making it effortless to avoid paying for performance. But I’d love seeing this in a fork.

@almson Well, AWS even documents it in https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/raid-config.html:

Creating a RAID 0 array allows you to achieve a higher level of performance for a file system than you can provision on a single Amazon EBS volume

But things have probably changed now with gp3 where you can pay more for better performance.

Sep 12 '22 10:09 pquentin

Another reason to support this is to allow for volumes which exceed the maximum size for EBS. Most volume types are limited to 16TiB, for example, so it would be nice to be able to combine EBS volumes to create larger Kubernetes volumes.

Sep 16 '22 17:09 jscaltreto

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Oct 16 '22 18:10 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Oct 16 '22 18:10 k8s-ci-robot