kubermatic icon indicating copy to clipboard operation
kubermatic copied to clipboard

Refactor Cluster Backup controllers

Open xrstf opened this issue 1 year ago • 5 comments

What this PR does / why we need it: This PR significantly restructures the Cluster Backup related code. The main reason for this is that currently the KKP UI/API create ClusterBackupStorageLocations (CBSLs) on the master cluster, yet they are required on the seed clusters to allow proper reconciling.

Previously the feature consisted of 2 controllers, both running on the seed cluster:

  • pkg/ee/cluster-backup is the main controller that installs Velero into a user-cluster.
  • pkg/ee/cluster-backup/storage-location is a different controller, responsible for regularly checking the availability of the bucket mentioned in a CBSL.

The first controller was moved into the user-cluster-ctrl-mgr, as its main purpose is managing resources inside the user cluster. The second one was moved to the master-ctrl-mgr, simply because

  1. The status information is only used by the KKP UI and so it must be available on the master.
  2. Having each seed individually check the CBSL might be considered a "mild DDoS".

As long as the reconciling code doesn't need up-to-date and seed-specific status infos, we can check the status centrally on the master cluster.

The new structure for the code is

  • pkg/ee/cluster-backup/user-cluster/velero-controller is the previous clusterbackup controller.
  • pkg/ee/cluster-backup/master/storage-location-controller is the 2nd one from the list above.

Finally, I have added a new, 3rd controller in

  • pkg/ee/cluster-backup/master/sync-controller is a near 1:1 copy of the project-synchronizer controller and will simply sync (fan out) all CBSLs onto all seed clusters. This controller, in contrast to the project-synchronizer does not watch Seed resources because it is managed by the seedcontrollerlifecycle controller and will be restarted anyway when a Seed changes.

Fixes #13799

What type of PR is this? /kind design /kind bug

Does this PR introduce a user-facing change? Then add your Release Note here:

* Fix ClusterBackupStorageLocations not being synchronized from the master to seed clusters.
* Refactor Cluster Backups: The controllers for this feature now run in the master-controller-manager and usercluster-controller-manager instead of the seed-controller-manager.

Documentation:

NONE

xrstf avatar Oct 11 '24 14:10 xrstf

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

kubermatic-bot avatar Oct 11 '24 14:10 kubermatic-bot

/test all

xrstf avatar Oct 11 '24 14:10 xrstf

LGTM label has been added.

Git tree hash: db98b2be26ea27c52fd0ce051bc02fc65f979a14

kubermatic-bot avatar Oct 25 '24 09:10 kubermatic-bot

Looks great! I have a small concern though. Seed clusters are technically "remote resources", we normally expect them to be alive all the time. However, recently we had a situation where a seed was expected to be down for extended periods due to network/power related issues. Unfortunately, at the time our API code expected seeds to be always on, and that broke the UI and a few webhook calls which kind of broke the API, too.

Should we take this into consideration here somehow? Or we should make Each() handle it implicitly? I can argue that CBSL's are not expected to change that often anyway so we can even ignore it. I am not sure..

moelsayed avatar Oct 25 '24 09:10 moelsayed

/approve

xrstf avatar Nov 04 '24 10:11 xrstf

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: xrstf

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

kubermatic-bot avatar Nov 04 '24 10:11 kubermatic-bot

/cherrypick release/v2.26

xrstf avatar Nov 04 '24 10:11 xrstf

@xrstf: once the present PR merges, I will cherry-pick it on top of release/v2.26 in a new PR and assign it to you.

In response to this:

/cherrypick release/v2.26

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

kubermatic-bot avatar Nov 04 '24 10:11 kubermatic-bot

/retest

xrstf avatar Nov 04 '24 11:11 xrstf

@xrstf: new pull request created: #13842

In response to this:

/cherrypick release/v2.26

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

kubermatic-bot avatar Nov 04 '24 12:11 kubermatic-bot