backup-restore-operator
backup-restore-operator copied to clipboard
Migration from k3s local cluster to rke2 breaks with config restore
Rancher Server Setup
- Rancher version: 2.6.9
- Installation option (Docker install/Helm Chart): rancher-latest/rancher --version 2.6.9
- Kubernetes Version and Engine: v1.23.13+rke2r1
Describe the bug TL;DR - migration from RMS on a k3s local cluster to use rke2 as the local cluster causes the operator to "register" the local rke2 cluster as k3s, triggering the automated upgrade on the local cluster to try to "upgrade" rke2 to k3s, to put the local cluster in the same engine version as when the backup was created. This breaks RMS.
Long story - We wanted to migrate our Rancher RMS from a k3s single-node cluster, to an rke2 single-node cluster. When we backed up the RMS configs from k3s cluster (backed up using the backup-restore-operator), and then applied the configs to the new rke2 cluster for migration. When we applied the configs, Rancher suddenly decided the rke2 local cluster was a k3s cluster, and tried to apply the "update strategy" to upgrade the "local" cluster from what it was at (v1.23.13+rke2r1
) to what the local cluster was at when it was backed up (v1.24.6+k3s1
). Obviously, the automated upgrade from rke2 to k3s fails, but it keeps the local cluster in "unscheduleable" status until the failing upgrade completes, which never completes.
To Reproduce Steps to reproduce the behavior:
- Set up RMS on a local cluster running k3s
- Migrate RMS to a local cluster running rke2
- local cluster running rke2 tries and fails to "upgrade" to k3s, keeping the local cluster in a cordoned state
Expected behavior Migration should be allowed to move to a different local cluster, you should not be limited to the same k8s engine as before
Additional context local cluster definition after applying configs: (cluster running rke2, but now "registered" as a k3s cluster) fleet-local.yaml.txt
Moving this to the RKE2/K3S team as this does not appear to be a bug in the backup/restore utility.
How did we determine that? This certainly appears to be a rancher issue rather than a distro issue, I don't know there's anything we can do from the distro side?
This is not a distro team issue FWIW, though not entirely sure where it needs to go TBH, taking myself off as assignee
Just realized this issue is the same as the one on rancher/rancher that I've recently begun working on. So I'm assigning myself here to match that one: https://github.com/rancher/rancher/issues/42158