external-snapshotter
external-snapshotter copied to clipboard
VolumeGroupSnapshots - how to rebuild/restore a VolumeGroupSnapshot?
What happened:
When creating a VolumeGroupSnapshot from multiple PVCs, it's a bit unclear how to do a restore.
It looks like you are required to individually restore each volumesnapshot in the group into a PVC - but is there an easy way to map each volumesnapshot back to the original PVC it was taken from?
The volumegroupsnapshot has status that lists volumesnapshots
but doesn't provide information (as far as I can tell) linking them back to the original PVC.
Example volumegroupsnapshot
status:
status:
boundVolumeGroupSnapshotContentName: groupsnapcontent-84656059-5c4b-4289-9d8f-464f4085b331
creationTime: "2023-11-30T18:43:03Z"
readyToUse: true
volumeSnapshotRefList:
- kind: VolumeSnapshots
name: snapshot-6e37158c2ce07fe27cba4ef0bc84c58f2801d344bdffff539aa1b786ab57d1e4-2023-11-30-6.43.3
namespace: source
uid: 6d0e0c03-696a-4f0b-a136-3a2c026f360e
- kind: VolumeSnapshots
name: snapshot-d01f061e29fb3f492579b031b9bcd6723e7ad4860c647e4397a89aea375ff116-2023-11-30-6.43.4
namespace: source
uid: c1ae59ad-8a36-4630-90c8-b45d4b9d42e2
There is some information about the PVs in the volumegroupsnapshotcontent object, but again I'm not sure how to map this to the individual snapshots.
What you expected to happen:
In order to restore a volumegroupsnapshot I need to be able to restore each snapshot to the proper PVC.
How to reproduce it:
Anything else we need to know?:
Environment:
- Driver version:
- Kubernetes version (use
kubectl version
): - OS (e.g. from /etc/os-release):
- Kernel (e.g.
uname -a
): - Install tools:
- Others:
Some more context - I was testing with the CSI HostPath driver and the information is also not present in the individual snapshots, as spec.source.persistentVolumeClaimName
is not set.
Example volumesnapshot that was created by the volumegroupsnapshot:
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
creationTimestamp: "2023-11-30T18:43:04Z"
finalizers:
- snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection
- snapshot.storage.kubernetes.io/volumesnapshot-bound-protection
generation: 1
labels:
volumeGroupSnapshotName: new-groupsnapshot-demo
name: snapshot-6e37158c2ce07fe27cba4ef0bc84c58f2801d344bdffff539aa1b786ab57d1e4-2023-11-30-6.43.3
namespace: source
resourceVersion: "43592"
uid: 6d0e0c03-696a-4f0b-a136-3a2c026f360e
spec:
source:
volumeSnapshotContentName: snapcontent-6e37158c2ce07fe27cba4ef0bc84c58f2801d344bdffff539aa1b786ab57d1e4-2023-11-30-6.43.3
status:
boundVolumeSnapshotContentName: snapcontent-6e37158c2ce07fe27cba4ef0bc84c58f2801d344bdffff539aa1b786ab57d1e4-2023-11-30-6.43.3
creationTime: "2023-11-30T18:43:03Z"
readyToUse: true
restoreSize: 1Gi
Some more context - I was testing with the CSI HostPath driver and the information is also not present in the individual snapshots, as spec.source.persistentVolumeClaimName is not set.
Example volumesnapshot that was created by the volumegroupsnapshot:
This works as designed as we don't want each individual snapshot to be dynamically created.
This works as designed as we don't want each individual snapshot to be dynamically created.
Makes sense, thanks - I think it does mean I don't currently have any reliable way of determining which snapshot goes with which original pvc when I'm trying to restore?
Can you check VolumeGroupSnapshotContent?
Are you working on implement this in your CSI driver? If so, here is a workaround. In CreateVolumeGroupSnapshotRequest, there is repeated string source_volume_ids
. In CreateVolumeGroupSnapshotResponse, there is repeated Snapshot snapshots
. When constructing the response, make sure the snapshots are appended in the same order as their source volumes in source_volume_ids. This way you can find mappings between PV and VolumeSnapshotContent in VolumeGroupSnapshotContent.
apiVersion: groupsnapshot.storage.k8s.io/v1alpha1
kind: VolumeGroupSnapshotContent
metadata:
creationTimestamp: "2023-12-05T21:56:48Z"
finalizers:
- groupsnapshot.storage.kubernetes.io/volumegroupsnapshotcontent-bound-protection
generation: 1
name: groupsnapcontent-dc63473c-b310-4ddc-8698-4e70442457dd
resourceVersion: "96004"
uid: b04e2744-5e34-4a40-9507-fff1cc7e3187
spec:
deletionPolicy: Delete
driver: hostpath.csi.k8s.io
source:
persistentVolumeNames:
- pvc-e15ccefa-12a5-4eb1-965d-ed7f1b142f99
- pvc-971d6c80-fe7f-405b-9fbd-ab7b80b9c4ed
volumeGroupSnapshotClassName: csi-hostpath-groupsnapclass
volumeGroupSnapshotRef:
apiVersion: groupsnapshot.storage.k8s.io/v1alpha1
kind: VolumeGroupSnapshot
name: cluster-example-with-volume-snapshot-20231205215643
namespace: default
resourceVersion: "95980"
uid: dc63473c-b310-4ddc-8698-4e70442457dd
status:
creationTime: 1701813408401819395
readyToUse: true
volumeGroupSnapshotHandle: 3052a006-93b9-11ee-987f-5acef7aa0d0c
volumeSnapshotContentRefList:
- kind: VolumeSnapshotContent
name: snapcontent-c91b95d44137bbecb906cbce13a2e2d1a19181a4db5b600644cbf9443c84000b-2023-12-05-9.56.49
- kind: VolumeSnapshotContent
name: snapcontent-650c86b012a3049f298e0e3a6e08d858687c0820cd55fbb8ba4e1db0ed3c2e5a-2023-12-05-9.56.49
cc @leonardoce
We're using the same workaround @xing-yang is referring to in https://github.com/cloudnative-pg/cloudnative-pg/pull/3345, the PR adding VolumeGroupSnapshot support in CloudNative-PG. We develop a PostgreSQL operator and use VolumeGroupSnapshots to take consistent backups of the database.
We assume that .spec.source.persistentVolumeNames
(a list of references to PVs) and .status.volumeSnapshotContentRefList
(a list of references to VolumeSnapshotContents) are parallel, and we use this information to reconstruct the link between a VolumeSnapshot and the corresponding PVC.
Thanks for this info @leonardoce and @xing-yang .
In my case I'm not developing a CSI driver, but looking at it from an end-user perspective.
I think this is workable, assuming every CSI driver implementation is going to keep the persistentVolumeNames
and volumeSnapshotContentRefList
in the same order (hopefully we can assume this), however I don't believe it's ideal.
A user currently would do this:
- Create their PVCs for their app, and label them
- Create a volumegroupsnapshot with the label selector
- Now when they want to restore, they need to look at the volumegroupsnaphot, then the volumegroupsnapshotcontents, then map the PV list to the volumesnapshotcontents list, and then find the volumesnapshots from the volumesnapshotcontents. This also assumes they're keeping track of the PVs that were associated with the original PVCs from step 1 (note that for step1 & 2 they are dealing with PVCs only, not the underlying PVs). At this point when they're trying to restore their data, the original PVCs may also have been deleted.
- Now they can startup their app that uses the PVCs - if at this point the wrong volumesnapshot has been restored to the wrong pvc name this would be very problematic.
The end user has created the volumegroupsnapshot only working at the PVC level (as this is what they label). I think somewhere in the volumegroupsnapshot spec the information about which pvc is backed up to which volumesnapshot is needed since there will be no way of restoring the entire volumegroupsnapshot at once.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle rotten
When constructing the response, make sure the snapshots are appended in the same order as their source volumes in source_volume_ids. This way you can find mappings between PV and VolumeSnapshotContent in VolumeGroupSnapshotContent.
@xing-yang we cannot assume the order unless it's enforced in the CSI SPEC, isn't it? because the csi driver is developed by someone and backup software's are developed by some others, adding a PVC identifier to the volume snapshot could also help maintain clarity and avoid any ambiguity. Should we add the PVC name as the annotation when creating the volumesnapshots? (we need to list all the PV and check the volumeHandle and add it, its a heavy operation where we need to list the PV and loop through each)
That's a temporary workaround. I'm thinking about making a change in the VolumeGroupSnapshot APIs. https://docs.google.com/document/d/1NdNwFD5Z64K2heQLYOnojt6Ogulg750BuYIJ6W9cEiM/edit?usp=sharing
That's a temporary workaround. I'm thinking about making a change in the VolumeGroupSnapshot APIs. https://docs.google.com/document/d/1NdNwFD5Z64K2heQLYOnojt6Ogulg750BuYIJ6W9cEiM/edit?usp=sharing
Thank you @xing-yang