velero
velero copied to clipboard
Design for Velero Backup performance Improvements and VolumeGroupSnapshot enablement
Describe the problem/challenge you have
Note that this issue is to track the design only. There will be two separate implementation phases to implement this design, which will be tracked by separate issues as implementation will end up in a future Velero release.
There are two different goals here, linked by a single primary missing feature in the Velero backup workflow. The first goal is to enhance backup performance by allowing the primary backup controller to run in multiple threads, enabling Velero to back up multiple items at the same time for a given backup. The second goal is to enable Velero to eventually support VolumeGroupSnapshots. For both of these goals, Velero needs a way to determine which items should be backed up together. This design proposal will include two development phases:
- Phase 1 will refactor the backup workflow to identify blocks of items that should be backed up together, and then coordinate backup hooks among items in the block.
- Phase 2 will add multiple multiple worker threads for backing up item blocks, so instead of backing up each block as it identified, the velero backup workflow will instead add the block to a channel and one of the workers will pick it up.
- Actual support for VolumeGroupSnapshots is out-of-scope here and will be handled in a future design proposal, but the item block refactor introduced in Phase 1 is a primary building block for this future proposal.
Background
Currently, during backup processing, the main Velero backup controller runs in a single thread, completely finishing the primary backup processing for one resource before moving on to the next one. We can improve the overall backup performance by backing up multiple items for a backup at the same time, but before we can do this we must first identify resources that need to be backed up together. As part of this initial refactoring, once these "Item Blocks" are identified, an additional change will be to move pod hook processing up to the ItemBlock level. If there are multiple pods in the ItemBlock, pre-hooks for all pods will be run before backing up the items, followed by post-hooks for all pods. This change to hook processing is another prerequisite for future VolumeGroupSnapshot support, since supporting this will require backing up the pods and volumes together for any volumes which belong to the same group. Once we are backing up items by block, the next step will be to create multiple worker threads to process and back up ItemBlocks, so that we can back up multiple ItemBlocks at the same time.
Goals
- Identify groups of items to back up together (ItemBlocks)
- Manage backup hooks at the ItemBlock level rather than per-item
- Using worker threads, back up ItemBlocks at the same time.
Non Goals
- Support VolumeGroupSnapshots: this is a future feature, although certain prerequisites for this enhancement are included in this proposal.
- Process multiple backups in parallel: this is a future feature, although certain prerequisites for this enhancement are included in this proposal.
Environment:
- Velero version (use
velero version
): - Kubernetes version (use
kubectl version
): - Kubernetes installer & version:
- Cloud provider or hardware configuration:
- OS (e.g. from
/etc/os-release
):
Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.
- :+1: for "The project would be better with this feature added"
- :-1: for "This feature will not enhance the project in a meaningful way"