Refactor Cluster Backup controllers
What this PR does / why we need it: This PR significantly restructures the Cluster Backup related code. The main reason for this is that currently the KKP UI/API create ClusterBackupStorageLocations (CBSLs) on the master cluster, yet they are required on the seed clusters to allow proper reconciling.
Previously the feature consisted of 2 controllers, both running on the seed cluster:
-
pkg/ee/cluster-backupis the main controller that installs Velero into a user-cluster. -
pkg/ee/cluster-backup/storage-locationis a different controller, responsible for regularly checking the availability of the bucket mentioned in a CBSL.
The first controller was moved into the user-cluster-ctrl-mgr, as its main purpose is managing resources inside the user cluster. The second one was moved to the master-ctrl-mgr, simply because
- The status information is only used by the KKP UI and so it must be available on the master.
- Having each seed individually check the CBSL might be considered a "mild DDoS".
As long as the reconciling code doesn't need up-to-date and seed-specific status infos, we can check the status centrally on the master cluster.
The new structure for the code is
-
pkg/ee/cluster-backup/user-cluster/velero-controlleris the previousclusterbackupcontroller. -
pkg/ee/cluster-backup/master/storage-location-controlleris the 2nd one from the list above.
Finally, I have added a new, 3rd controller in
-
pkg/ee/cluster-backup/master/sync-controlleris a near 1:1 copy of theproject-synchronizercontroller and will simply sync (fan out) all CBSLs onto all seed clusters. This controller, in contrast to theproject-synchronizerdoes not watch Seed resources because it is managed by the seedcontrollerlifecycle controller and will be restarted anyway when a Seed changes.
Fixes #13799
What type of PR is this? /kind design /kind bug
Does this PR introduce a user-facing change? Then add your Release Note here:
* Fix ClusterBackupStorageLocations not being synchronized from the master to seed clusters.
* Refactor Cluster Backups: The controllers for this feature now run in the master-controller-manager and usercluster-controller-manager instead of the seed-controller-manager.
Documentation:
NONE
Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all
/test all
LGTM label has been added.
Looks great! I have a small concern though. Seed clusters are technically "remote resources", we normally expect them to be alive all the time. However, recently we had a situation where a seed was expected to be down for extended periods due to network/power related issues. Unfortunately, at the time our API code expected seeds to be always on, and that broke the UI and a few webhook calls which kind of broke the API, too.
Should we take this into consideration here somehow? Or we should make Each() handle it implicitly? I can argue that CBSL's are not expected to change that often anyway so we can even ignore it. I am not sure..
/approve
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: xrstf
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [xrstf]
- ~~pkg/install/OWNERS~~ [xrstf]
- ~~pkg/resources/OWNERS~~ [xrstf]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
/cherrypick release/v2.26
@xrstf: once the present PR merges, I will cherry-pick it on top of release/v2.26 in a new PR and assign it to you.
In response to this:
/cherrypick release/v2.26
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
/retest
@xrstf: new pull request created: #13842
In response to this:
/cherrypick release/v2.26
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.