velero
velero copied to clipboard
Add design for only restore volume data.
Thank you for contributing to Velero!
Please add a summary of your change
Does your change fix a particular issue?
Fixes #(issue)
Please indicate you've done the following:
- [x] Accepted the DCO. Commits without the DCO will delay acceptance.
- [ ] Created a changelog file or added
/kind changelog-not-required
as a comment on this pull request. - [ ] Updated the corresponding documentation in
site/content/docs/main
.
@blackpiglet, Can you please rebase the PR? It is currently showing what appears to be many unrelated commits.
Few comments/questions:
- How are the new PVC names generated? I doesn't look like users can specify the target names so I guess Velero will rename them in a certain way?
- Since "pods" is not included in the resource type, I guess this restore is not supported for FSB?
- Does Snapshot data mover restore work for PVCs without specifying pods as resource type? I will verify it but it can work I guess because a mover pod is started?
I personally think that in-place restore is going to be more useful, especially if files can be selected for restore. If the whole PVC is being created any way, users will need to restart pods to attach the PVCs so it is almost like restoring pods themselves.
Can https://github.com/vmware-tanzu/velero/pull/6354 be used in a generic way to achieve this use case? CC: @kaovilai
Can https://github.com/vmware-tanzu/velero/pull/6354 be used in a generic way to achieve this use case? CC: @kaovilai
So restore limiting resources to a labeled pv (or namespaced PVC) with recreate flag?
@anshulahuja98 If the in-place restore is required, the recreating Existing Resource Policy cannot do that.
@draghuram Thanks. I rebased the branch, but there is still some problem with GPRC generating.
First, could you share some thoughts about why the in-place restore is more useful? I recently also heard some requirements about the GitOps and DevOps pipeline scenario. In those cases, some tools guarantee that the workload in k8s always running as expected. The in-place restore is needed in those cases, but I don't know whether those are common cases for k8s usage.
Second, to answer your question.
- How are the new PVC names generated? I doesn't look like users can specify the target names so I guess Velero will rename them in a certain way? - User cannot specify the generated name. I haven't decided how to build the name yet. Maybe something like
rename-pvc-<uuid>
. Since "pods" is not included in the resource type, I guess this restore is not supported for FSB? - FSB restore is supported, because the Generate Backup/Restore will first create an intermediate pod and PVC to mount the PV. After the backup/restore, the intermediate pod and PVC are deleted. Does Snapshot data mover restore work for PVCs without specifying pods as resource type? I will verify it but it can work I guess because a mover pod is started? - Snapshot data mover can work without the pod, the reason is the same as the second item.
@blackpiglet, gitops is certainly one reason for in place restore. Another use case is the need to periodically update data in an alternate standby cluster. Finally, if some application files are deleted or corrupted, user may only want to restore those files.
@blackpiglet, gitops is certainly one reason for in place restore. Another use case is the need to periodically update data in an alternate standby cluster. Finally, if some application files are deleted or corrupted, user may only want to restore those files.
@draghuram Thanks for the feedback. If we go for the in-place data restore way, there will be some limitations to the using scenario. Only the filesystem uploader backup is supported. For the snapshot-based backup, Achieving the in-place restore result is impossible.
Is this acceptable? Data-only restore cannot support snapshot-based backups. Involve more maintainers in this discussion. @anshulahuja98 @kaovilai @sseago
FSB only is acceptable.
Tho I think restore from CSI snapshot to new PVC and then patching to use new PVC could be considered in-place with the caveat that it requires pod restart.
However I think restore from CSI snapshot to new PVC and then patching to use new PVC could be considered in-place with the caveat that it requires pod restart.
Thanks for the quick response, then I will follow the filesystem-only way for now. I got your point. There was a similar discussion in the Velero team. It does work, although it seems a bit inefficient. We can continue the discussion. The idea of the filesystem data-only restore will also cause the pod to restart because the PodVolumeRestore way is preferred. That will introduce a new InitContainer in the pod. The advantage is that the pod's data operation will not interrupt the data restore process.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 61.71%. Comparing base (
2518824
) to head (156685a
).
Additional details and impacted files
@@ Coverage Diff @@
## main #7481 +/- ##
=======================================
Coverage 61.71% 61.71%
=======================================
Files 263 263
Lines 28869 28869
=======================================
Hits 17816 17816
Misses 9793 9793
Partials 1260 1260
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@blackpiglet, gitops is certainly one reason for in place restore. Another use case is the need to periodically update data in an alternate standby cluster. Finally, if some application files are deleted or corrupted, user may only want to restore those files.
@draghuram Thanks for the feedback. If we go for the in-place data restore way, there will be some limitations to the using scenario. Only the filesystem uploader backup is supported. For the snapshot-based backup, Achieving the in-place restore result is impossible.
Is this acceptable? Data-only restore cannot support snapshot-based backups. Involve more maintainers in this discussion. @anshulahuja98 @kaovilai @sseago
I went through further over the discussion I understand a pure in-place restore can't be achieved without pod downtime.
With that caveat, I want to push for CSI snapshot based in-place restore. Here in-place restore refers mainly to Detaching PVC from workload, deleting PVC and re-creating PVC.
This is useful for Disaster recovery scenarios.
With that caveat, I want to push for CSI snapshot based in-place restore. Here in-place restore refers mainly to Detaching PVC from workload, deleting PVC and re-creating PVC.
This proposal also cannot avoid pod downtime, so I suppose it aims to make the data-only restore can support more types of backups, right?
I understand a pure in-place restore can't be achieved without pod downtime.
If no downtime is a must-have feature, we can achieve that by ignoring the PodVolumeRestore's related pod InitContainer check logic, although that will compromise the data integrity, and data write could fail due to conflict.
I just noticed that this design doesn't cover the consideration for WaitForFirstConsumer volumes, some discussion see here. According to the design, we will create the PVC/PV for users. If the volume to be restored is with WaitForFirstConsumer mode, there is no way to create the volume appropriately without having the information which possible nodes the pod (that is going to consume the restored volume) will be scheduled. For data only restore, we cannot assume that users could create the pod beforehand, but users must know which nodes they want to schedule their pod if they want to make some constraints. Therefore, we need to add a new parameter for users to deliver the candidate nodes for a specific volume. These is a must have parameter.