velero icon indicating copy to clipboard operation
velero copied to clipboard

Post Restore Hooks might attempt to start running before DataDownload operation has release the related PV

Open amastbau opened this issue 1 year ago • 6 comments

What steps did you take and what happened:

When we create a restore with the Velero data mover, The data download CRs are created which creates a PVC in velero namespace and starts restoring all the data to it. Meanwhile, velero starts restoring application pods. Application pod goes in pending status as the PV is already bound to the PVC created w/ datadownload reconcilation . Once the data download operation is completed it releases the PV and it becomes available for bound. In many cases, the restore attempts to run before the datamover is completed, causing it to timeout (waiting for the pod) We may increase the hook wait timeout, but then we're no longer restoring asynchronously -- meaning we won't start the next DD (for the next pod) until this one is done, so the volume restores won't happen in parallel.

Environment:

  • Velero version (use velero version): 1.12
  • Velero features (use velero client config get features):
  • Kubernetes version (use kubectl version):
  • Kubernetes installer & version:
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • :+1: for "I would like to see this bug fixed as soon as possible"
  • :-1: for "There are more important bugs to focus on right now"

amastbau avatar Nov 14 '23 13:11 amastbau

To fully fix this, we may need to handle hooks after async actions have completed. One possibility: attempt hook initially with small timeout for Pod readiness, and if pod is not yet ready (as will be the case when there are datamover PVCs), then create a list of deferred post restore hooks -- new struct with enough data to run them later -- i.e. pod, hook content, etc. This will need to be uploaded as a new restore file (similar to the item operation file). To do this we'd need to either add a finalize phase to Restore or just run it in the restore operations controller after all operations have completed.

There may be other ways to solve it as well. Easiest fix would be to wait at pod restore time until the pod is ready, but probably not recommended, as that would essentially turn datadownloads into a synchronous operation, losing the ability to handle downloads in parallel for multiple pods.

sseago avatar Nov 14 '23 13:11 sseago

I understand the post-hook may timeout waiting for the DataDownload to complete, but I don't get why that will make the volumes restore not work in parallel.

The DataDownloads are created, the restore process, according to the backup-included DataUploads resources. The DataUploadRetrieveAction action will create a ConfigMap per each DataUpload. Then the CSI plugin will create a DataDownload when restores the PVC. The process doesn't depend on the Pod resource.

blackpiglet avatar Nov 16 '23 03:11 blackpiglet

@blackpiglet "but I don't get why that will make the volumes restore not work in parallel." -- if the timeout for the hook is long enough to complete the download, then the next pod's download won't start until this one is done. That was mainly a comment around the fact that we don't want to increase this timeout to wait for the volume to mount before running hooks, since that would prevent parallel action.

sseago avatar Dec 04 '23 14:12 sseago

I agree the Restore Exec hook makes the restore not work in parallel ways at some points. https://velero.io/docs/v1.13/restore-hooks/#exec-restore-hooks

But I don't think the bottleneck is the DataDownloads cannot be created or handled parallelly. As I mentioned in the previous comment, the DataUploadResult ConfigMaps and the DataDownload CRs are created according to the DataUpload and the PVC. There is no dependence on the Pod resources.

After introducing the node-agent concurrency, I think there is no limitation to handling multiple DataDownloads at the same time.

The problem is the pod restoration will not complete before the restore Exec hook completes. That makes the pod restoration cannot work in a parallel way.

There are two options in my mind:

  • After this issue is resolved, https://github.com/vmware-tanzu/velero/issues/7183, there will be a new restore finalization phase. We can move the exec hook handle logic to there.
  • Introducing a new CRD, which will be similar to Kinister's Blueprints, Velero will handle them after the pods enter the Running state.

blackpiglet avatar Jan 30 '24 06:01 blackpiglet

If we can accept that Restore enters the finalizing state without running the post-restore hook, the first choice is better.

The plan is:

  • remove the restore post hook waiting logic for pod resource here: https://github.com/vmware-tanzu/velero/blob/1034d6aee0b32b6ec0dbe4203902cd1d465ec950/pkg/restore/restore.go#L1733-L1735
  • Instead, apply a label (e.g. wait-for-restore-post-hook) to the pods that need to run restore post hook.
  • After the backup enters the finalizing state, the restore finalize controller should query pods by the label (wait-for-restore-post-hook). Run the post hook for the pods parallelly, and the wait-for-restore-post-hook label should be removed after the post hooks are completed.

blackpiglet avatar Jan 30 '24 08:01 blackpiglet

Maybe we can start a goroutine to wait for the async operation complete and start the timer of the hook execution, and handle the result in the the finalizer controller.

reasonerjt avatar Feb 02 '24 09:02 reasonerjt

Currently in Velero, the restore process will remain in the "InProgress" phase until all post-restore hooks have finished executing. This might negatively impact parallelism when async operations are involved in the restore.

The reason is that pods containing hooks transition to "Running" state once their associated async tasks complete. So during the period where pods are waiting for async tasks, Velero will also be waiting in the "InProgress" phase.

A better approach would be to start go routines to handle the execution of hooks asynchronously in InProgress phase and wait for the result of hook executions in Finalizing phase where all the async operations are complete. This allows the restore process to continue without blocking on individual hook executions in InProgress phase.

allenxu404 avatar Mar 22 '24 10:03 allenxu404