velero
velero copied to clipboard
Add design for Extending VolumePolicies to support more actions
This PR adds design for https://github.com/vmware-tanzu/velero/issues/6640
Please indicate you've done the following:
- [x] Accepted the DCO. Commits without the DCO will delay acceptance.
- [x] Created a changelog file or added
/kind changelog-not-required
as a comment on this pull request. - [ ] Updated the corresponding documentation in
site/content/docs/main
.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 61.88%. Comparing base (
e65ef28
) to head (bec3425
). Report is 160 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #6956 +/- ##
==========================================
+ Coverage 61.76% 61.88% +0.12%
==========================================
Files 262 266 +4
Lines 28440 29456 +1016
==========================================
+ Hits 17567 18230 +663
- Misses 9642 9934 +292
- Partials 1231 1292 +61
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
It sounds like we need to completely step back. @shubham-pampattiwar has been designing for one set of requirements, but the other reviews seem to have a completely different set of requirements in mind. So I think we have to step away from design for a bit and go back to requirements. I think we have a couple of options here:
-
Provide an alternative to the existing per-pod-volume annotations to determine when to use fs-backup. This is what the current design is based on -- workflow is unchanged from current annotations, we're just aiming to provide a more user-friendly means of selection.
-
Provide a completely different selection mechanism that relies 100% on VolumePolicies and doesn't interact with defaultVolumesToFsBackup or the annotations at all. This is what the other reviewers seem to want here.
-
will be the least risky to implement with the least amount of effort. 2) Might result in a more flexible approach, but it's completely incompatible with the legacy methods, so we would need validation to ensure that only one approach is used.
To expand on the validation needs -- backup.Spec.DefaultVolumesToFsBackup is a bool pointer. If this is set to true or false, then we're using the legacy opt-in/opt-out approach, and any volume policies referencing fs backup or snapshots should result in a validation error -- if this is true or false, then existing workflows will be used. If this is nil (unspecified), then volume policies are used to determine what to snapshot and what to use fsbackup for. We'll need to modify install/server to make this value a bool pointer instead of a plain bool -- currently, not specifying this on install/server results in the backup getting a false rather than a nil value here.
@shubham-pampattiwar what do you think?
@reasonerjt @qiuming-best @sseago I have re-done the design, please take another look, Thanks !
Some notes from latest community call: https://hackmd.io/Jq6F5zqZR7S80CeDWUklkA#Discussion-topics
My personal understanding from the call last night is that:
- VolumePolicies take precedence over legecy opt-in/opt-out labeling approaches
- If there is no matching VolumePolicy on a volume, the velero defers to the opt-in/opt-out approach
- Finally if there is no match for a volumePolicy and no match for opt-in/out method, the velero server install default (FSBackupOrSnapShot) will be authoritative.
@shubham-pampattiwar I'm off base I can edit or delete this comment.
Updated the PR based on the discussions in community call.
@weshayutin "3. Finally if there is no match for a volumePolicy and no match for opt-in/out method, the velero server install default (FSBackupOrSnapShot) will be authoritative."
This is not quite correct. the defaultVolumesToFsBackup field is part of the opt-in/opt-out approach described in 2. So your first two bullets are the entirety of it. "defaultVolumesToFsBackup=true" -> "opt-out", "defaultVolumesToFsBackup=false" -> "opt-in"
@Lyndon-Li @reasonerjt @sseago Updated design sections (High-level, Detailed Design and Implementation) to reflect changes needed to account for snapshot
action for csi snapshot PVCs, PTAL ! Thank you !
@reasonerjt Updated design implementation to use a volumehelper
package, PTAL, Thanks !
@shubham-pampattiwar I see there are still some opening comments, could you help to double check them? For the ones that are done, please resolve them; for the open ones, let's discuss in tomorrow's community meeting. Thanks!
@Lyndon-Li Done, updated the design and resolved the comments.
@reasonerjt Latest iteration of design updates the implementation parts and introduces a VolumeHelper
interface and a VolumePolicyHelper
struct which implements the VolumeHelper
interface.
Thanks for working on this (and Velero). I greatly appreciate the backups Velero is managing for me. I came here via the issue referenced above.
For my use cases the fs-backup feature is fine for many use cases (and I appreciate that it's explicit and the configuration is close to the workload configuration).
I use snapshot backups for some workloads where I need a "as if the computer crashed" consistency of the file system and I use data mover to move the data off the infrastructure the cluster is running in. This PR doesn't seem to support that use case.
The feature I'm missing currently (and I can't figure out from the resource filtering documentation is doing volume snapshots (and data mover) on some PVCs in a namespace instead of all.
For example I have a namespace running clickhouse databases and it's a mix of 1TB data that gets replaced within a day and about 100GB data that changes much less (and is more valuable). I'd like to do snapshots and data mover on the 100GB data without also having to copy the 1TB other data. (I can't easily divide them between namespaces as clickhouse or the CH operator has features that helps me generate and update the small data set from the large data set within the cluster).
@abh
Do you mean you want to specify data mover as the backup type for some of the volumes?
I think this has been discussed --- only fs-backup
vs. snapshot
could be specified per volume; then if snapshot
is specified, you need to use the existing per backup option --snapshot-move-data
to make it for data mover.
For sure, this would miss one functionality --- in a backup, among the volumes going with snapshot
, I want some volumes to go with data mover, some not. This may be covered by future extension of this design.
@shubham-pampattiwar Please correct me if my understanding is wrong.
@asaf-erlich If your goal is to back up with a mixture of fs-backup and datamover, that is absolutely supported by this design. Policies will identify volumes for fs-backup and snapshot, and setting --snapshot-move-data=true
tells velero to use datamover for all snapshotted volumes. The only use case that's not supported in this design is if you want to use CSI snapshots without data movement for some volumes (leaving VolumeSnapshotContents in the cluster) and CSI snapshots with data movement for others, but from the use case you described, that doesn't seem relevant here.
@reasonerjt PTAL, Thanks !
Do you mean you want to specify data mover as the backup type for some of the volumes?
For me the important part is to be able to choose which volumes are being backed up. How I understand the current settings I can only choose by namespace when I want snapshot+data-mover.
In my use case I want all the snapshots to be with data mover (and I don't think there are any scenarios where fs-backup is better if snapshot+data mover is possible?).
For my use cases having the label on the pod (or on the PVC) is useful in many scenarios because it's so explicit. If the configuration is more complicated I'd worry I mistype a glob match or a label selector and accidentally don't have backups.
@reasonerjt Thanks for the review, need a re-ack, fixed CI errors related to codespell. @sseago Please take another look, Thank you !