eks-anywhere
eks-anywhere copied to clipboard
Upgrading a management cluster also triggers upgrading all the workload clusters it manages
What happened:
When upgrading a cloudstack management cluster with latest v0.10.x CLI, the workload clusters associated with it will also get upgraded. We found the issue on cloudstack provider, but it can also happen to any other provider we support.
This happens because during management cluster upgrade, EKS-A conducts following core components upgrade before the actual target cluster upgrade:
- Core CAPI controllers and CRDs
- CAPI provider controllers and CRDs
- Etcdadm controllers and CRDs
- Cilium CNI plugin
- Cert Manager
- EKS-A cluster controller and CRDs
When the new EKS-A cluster controller is up and running, it reconciles the EKS-A spec and converts it into the underlying latest version of CAPI templates using the new controller code, and applies them on the management cluster. The new CAPI templates (which includes both management and workload resources) that are different from the old ones can potentially trigger a rolling upgrade of the workload cluster if etcdadm config, machinetemplate are changed or recreated.
What you expected to happen:
The workload cluster upgrade should happen independently from management cluster upgrade. We should not upgrade workload cluster and rollout recreate its machines if user does not trigger it specifically.
How to reproduce it (as minimally and precisely as possible):
Create a management cluster and a workload cluster with cloudstack provider using EKS-A CLI version v0.8.3. Upgrade the management cluster using EKS-A CLI version v0.10.x. The workload cluster will also be upgraded as its control plane and unstacked Etcd machines will be rollout and recreated.
Anything else we need to know?:
related issue: #2665
Environment:
- EKS Anywhere Release:
- EKS Distro Release:
Is there any proposed solution for this issue? Are we considering delaying the EKS-A controller upgrade until after the components have moved to the bootstrap cluster? Or somehow separating the shared resources the eks-a controller applies between workload and management clusters to be applied independently?
@jiayiwang7 @drewvanstone should this have v0.16 tag?
@jiayiwang7 Hasn't this been fixed in v0.17?
@jiayiwang7 Hasn't this been fixed in v0.17?
yes, closing