aws-ebs-csi-driver icon indicating copy to clipboard operation
aws-ebs-csi-driver copied to clipboard

Support RAID of multiple EBS volumes

Open sebgl opened this issue 3 years ago • 7 comments

One way to get better performance out of EBS volumes is to rely on RAIDs made of multiple EBS volumes, as described here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/raid-config.html

A common use case for example would be to build a RAID out of two st1 or sc1 volumes, which provides different performance/price characteristics than e.g. using gp3 volumes.

Unfortunately, that's not trivial in Kubernetes, as it natively expects that one mounted volume maps to a single PersistentVolume. It seems technically possible to do the following, for example in a privileged init container:

  • unmount the 2 EBS volumes that were mounted and formatted by the EBS CSI driver
  • assemble a RAID with mmap, and format the resulting file system In the main container, the RAID volume can then be mounted through a hostPath.

While it can technically work, it's not enough, as it does not ensure the lifecycle of the resulting volume:

  • the RAID volume should be unmounted if the Pod is deleted
  • accessing the same RAID volume from another host should be possible
  • resizing the RAID volume up

I think this should be better handled by the EBS CSI driver directly, and abstracted away from end-users. Users would for example declare the following storage class:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: sc1-raid
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
parameters:
  csi.storage.k8s.io/fstype: xfs
  type: sc1
  raid:
    type: 0
    volumeCount: 3

and the following PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ebs-claim
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: sc1-raid
  resources:
    requests:
      storage: 300Gi

As a result, the driver would provision 3 sc1 volumes of 100Gi each (PVC storage requests / storage class volume count, rounded up), build the RAID, and create one resulting PersistentVolume that can be bound to that claim. From the user perspective there is a single 300Gi volume, the way the RAID is manipulated is not the user's concern.

sebgl avatar Apr 28 '22 08:04 sebgl

/kind feature

rdpsin avatar Apr 28 '22 15:04 rdpsin

Not sure that AWS would be a fan. EBS is already backed by (distributed) RAID, yet the performance is pretty limited and AWS wants you to pay (a lot) more for better performance. This RAID-of-RAID gives performance for (literally) free despite costing AWS something. I don’t see AWS implementing this feature and making it effortless to avoid paying for performance. But I’d love seeing this in a fork.

almson avatar May 03 '22 14:05 almson

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Aug 01 '22 14:08 k8s-triage-robot

I don't think this is suitable to implement in the driver:

  1. Internally, the driver is largely built around the idea that 1 kubernetes volume maps to 1 EBS volume - it would be significant effort to overhaul this
  2. This breaks CSINode's Allocatable tracking, so kube-scheduler will have a random chance of scheduling too many volumes on one node and softlocking the cluster
  3. What are the implications for other features such as resizing and snapshots?

The driver supports block volumes, I would suggest just passing in X of those and assembling the RAID in-pod.

ConnorJC3 avatar Aug 04 '22 19:08 ConnorJC3

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Sep 03 '22 19:09 k8s-triage-robot

Not sure that AWS would be a fan. EBS is already backed by (distributed) RAID, yet the performance is pretty limited and AWS wants you to pay (a lot) more for better performance. This RAID-of-RAID gives performance for (literally) free despite costing AWS something. I don’t see AWS implementing this feature and making it effortless to avoid paying for performance. But I’d love seeing this in a fork.

@almson Well, AWS even documents it in https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/raid-config.html:

Creating a RAID 0 array allows you to achieve a higher level of performance for a file system than you can provision on a single Amazon EBS volume

But things have probably changed now with gp3 where you can pay more for better performance.

pquentin avatar Sep 12 '22 10:09 pquentin

Another reason to support this is to allow for volumes which exceed the maximum size for EBS. Most volume types are limited to 16TiB, for example, so it would be nice to be able to combine EBS volumes to create larger Kubernetes volumes.

jscaltreto avatar Sep 16 '22 17:09 jscaltreto

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Oct 16 '22 18:10 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Oct 16 '22 18:10 k8s-ci-robot