velero icon indicating copy to clipboard operation
velero copied to clipboard

Need a way to skip backup of volumes based on type

Open gsadhani opened this issue 2 years ago • 20 comments

Describe the problem/challenge you have When taking a backup, Velero tries to all pod volumes available. Some volumes can be too large or backed up using other solutions. It is desirable to skip backup of volumes based on its type.

Describe the solution you'd like A way to configure the backup to skip certain types of volumes like NFS.

Anything else you would like to add: None

Environment:

  • Velero version (use velero version): 1.8.1
  • Kubernetes version (use kubectl version): 1.22
  • Kubernetes installer & version:
  • Cloud provider or hardware configuration: vSphere
  • OS (e.g. from /etc/os-release): Photon

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • :+1: for "The project would be better with this feature added"
  • :-1: for "This feature will not enhance the project in a meaningful way"

gsadhani avatar Jun 20 '22 10:06 gsadhani

@gsadhani Could you clarify why the opt-out approach can't solve the problem? Is it b/c you don't want the user to update his workload to skip the backup

reasonerjt avatar Jun 21 '22 06:06 reasonerjt

@gsadhani Could you clarify why the opt-out approach can't solve the problem? Is it b/c you don't want the user to update his workload to skip the backup

@reasonerjt yes, identify each pod in the workload containing the specific type of volume and annotating them could be cumbersome. It would be very convenient if we could skip the volumes based on volume type which can be specified as a backup configuration parameter.

gsadhani avatar Jun 21 '22 08:06 gsadhani

there are many factors could differentiate volume type : storage class, or bigger, or nfs, or specific drivers. is it support to add the label velero.io/exclude-from-backup=true to mark the pv as excluded?

euclidsun avatar Jun 22 '22 07:06 euclidsun

We can consider K8S supported volume types only for this scenario https://kubernetes.io/docs/concepts/storage/volumes/. Also, initially we can start with allowed skip volume type list for limited volume types. Usually some of the volume types can be backed using storage systems like nfs, fc, iscsi OR ephemeral/non-persistent volumes like emptyDir. hostPath, downwardAPI, local etc.

This will reduce manual effort for annotating all pods. If there is autoscaling defined for pods, user need to find way to add annotation automatically.

pradeepkchaturvedi avatar Jun 22 '22 08:06 pradeepkchaturvedi

found more SKIP backup requirements, we should consider all cases and comeup with a generic way to handle skip:

https://github.com/vmware-tanzu/velero/issues/957 skip restic backup/restore: not backup the PV data, but backup the cluster state. https://github.com/vmware-tanzu/velero/issues/4129 Exclude folders for backing up https://github.com/vmware-tanzu/velero/issues/2413 Namespace skip (this could be enhanced as separate enhancement)

In a scenario where DBS excludes the Pods with volume from backup, then Pods will get stuck in pending state during the recovery phase. What is the better approach to handle and avoid manual effort? DBS tries to reduce the risk of failure or delay in backup & restore that occurred due to volume related workloads. E.g. 80% workloads running without volume and 20% Pods running with volume. During restore this 20% becomes risk for 80% of the workload recovery. The current approach the account team suggested is to create empty PV manually. We exclude all of the PVs while taking velero backup and create empty PV manually for the 20% otherwise, opt-in to backup the PV selectively.

euclidsun avatar Jul 04 '22 15:07 euclidsun

so in summary, the requirements of skip:

  • option to skip all PV data
  • option to skip specified PV type (inline, nfs)
  • option to skip specified PV size
  • option to skip folders ( I think this could be lower priority than others given we will evolve to snapshot/cbt based backup)

euclidsun avatar Jul 04 '22 15:07 euclidsun

I understand the requirement that we may want to introduce a way for user to skip backup of certain PV but not having to update the spec of the pods.

However, I'm not sure if skipping backup PV based on type is a good idea, this may be handy for a certain user but is it generic enough that users have the similar usage pattern that for some type of PV they have a separate backup process and velero does not need to handle them? Is it normal he has separate backup process for two of the nfs PVs but for other nfs PVs he still needs velero's help?

I think skipping PVs based on type is do-able but we should avoid adding features that are only feasible to a very small group of users.

reasonerjt avatar Jul 05 '22 11:07 reasonerjt

it does not have to be a specific type defined here: https://kubernetes.io/docs/concepts/storage/volumes/

typical user story:

  • as a platform operator, i know what backend storage i am using, so i want an easy way to skip the certain type storage being backup by velero.

Would it be better by using name of StorageClass? Platform operator should know well how many storage class they offer to applications. That way, we do not worry about the specific type of PV, and also give the flexility to admin.

A StorageClass provides a way for administrators to describe the "classes" of storage they offer. Different classes might map to quality-of-service levels, or to backup policies, or to arbitrary policies determined by the cluster administrators. Kubernetes itself is unopinionated about what classes represent. This concept is sometimes called "profiles" in other storage systems.

euclidsun avatar Jul 08 '22 10:07 euclidsun

@euclidsun In our case, some PVCs are only used for logging while another PVCs are for DBs and configurations. I think It's difficult to only classify them by StorageClass. Personally I think we may need a more flexible way from velero point of view, such as included/excluded arrays to have PVC/volume (in-tree) names or kind of "GVNR" style resource selector.

Then upper application can configure or calculate required volume lists to velero.

jerry-jibu avatar Aug 11 '22 06:08 jerry-jibu

@blackpiglet Let's start the design in 1.10 timeframe

ywk253100 avatar Aug 17 '22 08:08 ywk253100

By far, I thought out of two scenarios of skipping volume backing up.

  • Volumes are mounted multiple times in a backup. It should be handled only once. - I think this will reduce the work for opt-out volumes like NFS, when the same volume mounted many times.
  • "backup.velero.io/backup-volumes" and "backup.velero.io/backup-volumes-excludes" should be handled on controller level, then pod created from controller scaling can also be handled.

blackpiglet avatar Sep 01 '22 15:09 blackpiglet

I think filtering volume by type also has some value, but there may be not many suitable use cases. This is useful when volume's dataSource is not PVC. For example, if customer want to skip all emptyDir type of volumes, this would work. For nfs volumes, not sure whether customer using NFS volume by this way. Filtering by StorageClass name should be useful for PVC case.

blackpiglet avatar Sep 01 '22 15:09 blackpiglet

Another scenario is stated here https://github.com/vmware-tanzu/velero/issues/957#issuecomment-1231643187. Support only backing up one replica's volumes in a controller case.

blackpiglet avatar Sep 02 '22 01:09 blackpiglet

By far, I thought out of two scenarios of skipping volume backing up.

  • Volumes are mounted multiple times in a backup. It should be handled only once. - I think this will reduce the work for opt-out volumes like NFS, when the same volume mounted many times.
  • "backup.velero.io/backup-volumes" and "backup.velero.io/backup-volumes-excludes" should be handled on controller level, then pod created from controller scaling can also be handled.

Another scenario is stated here #957 (comment). Support only backing up one replica's volumes in a controller case.

@gsadhani @pradeepkchaturvedi @reasonerjt What's your opinion on this proposals?

blackpiglet avatar Sep 02 '22 07:09 blackpiglet

I think the higher priority is to design a mechanism and format of the filter in bakup CR so user can skip the PVs during backup, we will also need to decide if this filter applies to restic only or both restic and snapshot. We should also consider the case when the filter conflicts with the opt-in/out annotations.

A design proposal will be drafted in v1.10 timeframe.

reasonerjt avatar Sep 07 '22 08:09 reasonerjt

@reasonerjt It was mentioned on yesterday's community call that we may want to also use the filter for deciding which PVs to back up with restic/kopia vs. snapshots.

sseago avatar Sep 07 '22 13:09 sseago

@sseago Does that mean this filter is wanted to backup PVs by both uploader and snapshot?

blackpiglet avatar Sep 08 '22 01:09 blackpiglet

@blackpiglet I'm not sure what design/UX is best here -- separate filters for "what back up via kopia/restic" vs. "what to back up via snapshot", or one filter for "include these volumes", and another (depending on opt-in vs. opt-out for kopia) for "among the volumes backed up, these use kopia". If we're not careful, this can get very confusing for users. But it was brought up on the call this week that some users wanted a way to identify restic/kopia vs. csi/snapshot volumes without having to individually annotate one category or the other.

sseago avatar Sep 08 '22 13:09 sseago

@sseago Thanks for clarification. I will try to unify both in design.

blackpiglet avatar Sep 09 '22 01:09 blackpiglet

There is some more requirements related to this topic from #5340.

blackpiglet avatar Sep 20 '22 03:09 blackpiglet

close it for the implement pr is merged

qiuming-best avatar Mar 23 '23 06:03 qiuming-best