backup-restore-operator icon indicating copy to clipboard operation
backup-restore-operator copied to clipboard

Migration from k3s local cluster to rke2 breaks with config restore

Open fluzzykitten opened this issue 1 year ago • 4 comments

Rancher Server Setup

  • Rancher version: 2.6.9
  • Installation option (Docker install/Helm Chart): rancher-latest/rancher --version 2.6.9
  • Kubernetes Version and Engine: v1.23.13+rke2r1

Describe the bug TL;DR - migration from RMS on a k3s local cluster to use rke2 as the local cluster causes the operator to "register" the local rke2 cluster as k3s, triggering the automated upgrade on the local cluster to try to "upgrade" rke2 to k3s, to put the local cluster in the same engine version as when the backup was created. This breaks RMS.

Long story - We wanted to migrate our Rancher RMS from a k3s single-node cluster, to an rke2 single-node cluster. When we backed up the RMS configs from k3s cluster (backed up using the backup-restore-operator), and then applied the configs to the new rke2 cluster for migration. When we applied the configs, Rancher suddenly decided the rke2 local cluster was a k3s cluster, and tried to apply the "update strategy" to upgrade the "local" cluster from what it was at (v1.23.13+rke2r1) to what the local cluster was at when it was backed up (v1.24.6+k3s1). Obviously, the automated upgrade from rke2 to k3s fails, but it keeps the local cluster in "unscheduleable" status until the failing upgrade completes, which never completes.

To Reproduce Steps to reproduce the behavior:

  1. Set up RMS on a local cluster running k3s
  2. Migrate RMS to a local cluster running rke2
  3. local cluster running rke2 tries and fails to "upgrade" to k3s, keeping the local cluster in a cordoned state

Expected behavior Migration should be allowed to move to a different local cluster, you should not be limited to the same k8s engine as before

Additional context local cluster definition after applying configs: (cluster running rke2, but now "registered" as a k3s cluster) fleet-local.yaml.txt

fluzzykitten avatar Aug 15 '23 14:08 fluzzykitten

Moving this to the RKE2/K3S team as this does not appear to be a bug in the backup/restore utility.

MKlimuszka avatar Sep 20 '23 17:09 MKlimuszka

How did we determine that? This certainly appears to be a rancher issue rather than a distro issue, I don't know there's anything we can do from the distro side?

cwayne18 avatar Sep 20 '23 17:09 cwayne18

This is not a distro team issue FWIW, though not entirely sure where it needs to go TBH, taking myself off as assignee

cwayne18 avatar Nov 01 '23 21:11 cwayne18

Just realized this issue is the same as the one on rancher/rancher that I've recently begun working on. So I'm assigning myself here to match that one: https://github.com/rancher/rancher/issues/42158

mallardduck avatar Apr 29 '24 15:04 mallardduck