kubermatic icon indicating copy to clipboard operation
kubermatic copied to clipboard

User cluster backup feature fails to find backup locations on non-master seeds

Open embik opened this issue 1 year ago • 0 comments

What happened?

We discovered that the user cluster backup feature has a design flaw: It is missing logic for multi-seed setups. When you create a cluster backup location through the KKP dashboard (CRD is ClusterBackupStorageLocation), the object representing that location is created on the KKP master cluster. However, this object is referenced in the Cluster object and the backup feature is reconciled by the seed-controller-manager, thus the object needs to also be available on the seed.

This means that on seeds that are not a combined master/seed, the ClusterBackupStorageLocation object is not accessible to the seed-controller-manager, and installing the various components into the user cluster fails.

This is also compounded by the fact that the log line telling you this is a debug log and the error is not returned: https://github.com/kubermatic/kubermatic/blob/30c905bac915326f37951a8800268fea88aa4d20/pkg/ee/cluster-backup/controller.go#L186-L188

Therefore this is only visible if you turn on debug logging in the seed-controller-manager.

Expected behavior

The ClusterBackupStorageLocation objects should have been synced to the seed clusters so the seed-controller-manager can use them.

How to reproduce the issue?

  1. Set up KKP (EE) with at least two seeds.
  2. Enable the user cluster backup feature.
  3. Create a cluster backup location through the UI.
  4. Attempt to enable the feature on a user cluster that is located on a seed that is not the master.
  5. Observe that no Velero resources are created in the user cluster.

How is your environment configured?

  • KKP version: v2.25.11
  • Shared or separate master/seed clusters?: separate / additional seed

Provide your KKP manifest here (if applicable)

# paste manifest here

What cloud provider are you running on?

AWS

What operating system are you running in your user cluster?

N/A

Additional information

internal reference: INC-7381

embik avatar Oct 09 '24 15:10 embik