enhancements icon indicating copy to clipboard operation
enhancements copied to clipboard

KEP-3476: Add Volume Group KEP

Open xing-yang opened this issue 5 years ago • 23 comments

This PR proposes a VolumeGroup CRD.

  • Enhancement issue: https://github.com/kubernetes/enhancements/issues/3476

xing-yang avatar Feb 13 '20 04:02 xing-yang

Consider a statefulset have 3 replicas, and having 2 claims in volumeClaimTemplates.. totally, it becomes 6 PVCs With this KEP, would it be possible to tell the storage controller that 2 claims of each replica of statefulset are of one group, and there are 3 such groups so that controller can take care of placement of volumes

This is a common usecase where application is providing high availability by storing data in such a way that even if one pod goes down, application is running. But, if the underlying storage of those 6 PVCs comes from storage of single failure domain, application cannot run.

vishnuitta avatar Jun 18 '20 16:06 vishnuitta

Consider a statefulset have 3 replicas, and having 2 claims in volumeClaimTemplates.. totally, it becomes 6 PVCs With this KEP, would it be possible to tell the storage controller that 2 claims of each replica of statefulset are of one group, and there are 3 such groups so that controller can take care of placement of volumes

This is a common usecase where application is providing high availability by storing data in such a way that even if one pod goes down, application is running. But, if the underlying storage of those 6 PVCs comes from storage of single failure domain, application cannot run.

I think that should be possible. For the 2 PVCs on the 1st replica, we can add label replica-1, for the 2 PVCs on the 2nd replica, we can add label replica-2, and so on. We can create 3 VolumeGroups, one for PVCs on each replica. This is Immutable VolumeGroup with existing PVCs.

To support Mutable VolumeGroup with StatefulSet and place PVCs in each replica in a different group would require enhancement of the StatefulSet controller. If we always create a different group for PVCs in each replica, we can modify the StatefulSet controller to add the pod name as a suffix to the group name, but if we sometimes need to create a different group for PVCs in each replica, sometimes need to create just 1 group for all the PVCs in all replicas, and sometimes need to create a group for 2 out of 3 replicas, it will be tricky.

xing-yang avatar Jun 26 '20 02:06 xing-yang

Notes from today’s meeting:

  1. Add a CSI capability INDIVIDUAL_SNAPSHOT_RESTORE to indicate whether a CSI driver can support creating a volume from an individual volume snapshot if the volume snapshot is part of a VolumeGroupSnapshot. Use case: selective restore, advanced recovery
  2. Can all drivers support creating a volume from a snapshot and adding that volume to a group? Maybe we have to support Creating a VolumeGroup from a VolumeGroupSnapshot in one step, meaning we need to add VolumeGroupSnapshot back as an optional Source in VolumeGroup spec and create a new group and create individual volumes from snapshot in the group.
  3. Remove AddRemoveExistingPVC from VolumeGroupStatus? It is in CSI capability already. Do we want to show user that a group does not support add/remove?
  4. If user requests to add an existing PVC to a consistency group, but CSI driver cannot fulfill the request because the existing PVC is placed on a different storage pool from the consistency group, then CSI driver should just return failure.
  5. Add another boolean flag ConsistentGroupSnapshot in VolumeGroup spec to differentiate that from VolumeGroupSnapshot.

Will continue to review API Definitions section in the next meeting.

xing-yang avatar Dec 11 '20 18:12 xing-yang

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

fejta-bot avatar May 25 '21 22:05 fejta-bot

/remove-lifecycle stale

xing-yang avatar Jun 16 '21 14:06 xing-yang

Few open thoughts/quesitons I would like to bring it here:

  • [ ] The vgs status carry volumegroup snapshot list in the status field of it.
Type VolumeGroupSnapshotStatus struct {
...
        // List of volume snapshots
	// +optional
        SnapshotList []VolumeSnapshot
}

however some storage systems/backends ( cephfs) does not carry individual snapshots of underlying volumes while it take group snapshot, just wondering how can we support those backends and in absense of it, how the controller update the snap list status field:

Also,

  • [ ] Are we planning to support, clone operation of subvolumegroup ?
  • [ ] What is the Life cycle of volumegroup snapshot ?

humblec avatar Jul 13 '21 06:07 humblec

/retest

xing-yang avatar Aug 12 '21 12:08 xing-yang

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Dec 22 '21 15:12 k8s-triage-robot

/remove-lifecycle stale

xing-yang avatar Dec 23 '21 13:12 xing-yang

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Mar 23 '22 14:03 k8s-triage-robot

/remove-lifecycle stale

xing-yang avatar Mar 24 '22 03:03 xing-yang

Is volume group clone going to be supported?

ArbelNathan avatar Sep 13 '22 12:09 ArbelNathan

Is volume group clone going to be supported?

This can be a type of GroupDataSource which is in Phase 2. I'll add this.

xing-yang avatar Sep 13 '22 13:09 xing-yang

/assign @msau42 @jingxu97

xing-yang avatar Sep 14 '22 18:09 xing-yang

First pass review

msau42 avatar Sep 20 '22 04:09 msau42

@msau42 Addressed your comments. PTAL. Thanks.

xing-yang avatar Sep 21 '22 01:09 xing-yang

/assign @johnbelamaric

xing-yang avatar Oct 05 '22 01:10 xing-yang

@msau42 Addressed your comments. PTAL. Thanks.

xing-yang avatar Oct 06 '22 02:10 xing-yang

@johnbelamaric Addressed your comments. PTAL. Thanks.

xing-yang avatar Oct 06 '22 17:10 xing-yang

@johnbelamaric Addressed your latest comments. PTAL. Thanks.

xing-yang avatar Oct 06 '22 19:10 xing-yang

/approve

johnbelamaric avatar Oct 06 '22 19:10 johnbelamaric

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johnbelamaric, xing-yang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Oct 06 '22 19:10 k8s-ci-robot

/unassign

sftim avatar Oct 29 '22 13:10 sftim

@msau42 I addressed your comments. PTAL. Thanks.

xing-yang avatar Feb 07 '23 03:02 xing-yang

/lgtm

jsafrane avatar Feb 08 '23 19:02 jsafrane