Fix operator upgrade issue
What type of PR is this? /kind bug
What this PR does / why we need it: In PR #6180, the podSelectors for Karmada components such as etcd have been modified. Since the selector field of Deployments/StatefulSets is immutable, when the karmada-operator is upgraded, it attempts to modify this field while reconciling existing Karmada Custom Resources (CRs). This causes the reconciliation process to fail.
I0624 06:45:40.221750 1 controller.go:70] "Finished syncing karmada" karmada="test/karmada-demo" duration="69.143972ms"
E0624 06:45:40.221821 1 controller.go:347] "Reconciler error" err="failed to install etcd component, err: error when creating Etcd statefulset, err: StatefulSet.apps \"karmada-demo-etcd\" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'ordinals', 'template', 'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' are forbidden" controller="karmada-operator-controller" controllerGroup="operator.karmada.io" controllerKind="Karmada" Karmada="test/karmada-demo" namespace="test" name="karmada-demo" reconcileID="80c957ab-16fe-4d2f-b1c3-febe4e8e31dc"
Which issue(s) this PR fixes: Fixes #
Special notes for your reviewer: How to reproduce it
- Install karmada-operator with imagetag
v1.13.0 - Deploy a Karmada instance
kubectl apply -f - <<EOF
apiVersion: operator.karmada.io/v1alpha1
kind: Karmada
metadata:
name: karmada-demo
namespace: test
EOF
- Upgrade karmada-operator to
v1.14.0
kubectl set image --namespace karmada-system deployments/karmada-operator karmada-operator=docker.io/karmada/karmada-operator:v1.14.0
Does this PR introduce a user-facing change?:
`karmada-operator`: Fix the issue where upgrading the Karmada Operator causes Karmada instances to fail
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please ask for approval from zhzhuang-zju. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
cc @jabellard
FYI: this fix needs to be backported to release-1.14
:warning: Please install the to ensure uploads and comments are reliably processed by Codecov.
Codecov Report
:x: Patch coverage is 0% with 12 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 48.97%. Comparing base (b20715e) to head (7be84f6).
:warning: Report is 424 commits behind head on master.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| operator/pkg/util/apiclient/idempotency.go | 0.00% | 12 Missing :warning: |
| :exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality. |
Additional details and impacted files
@@ Coverage Diff @@
## master #6483 +/- ##
==========================================
- Coverage 48.99% 48.97% -0.02%
==========================================
Files 687 687
Lines 56090 56101 +11
==========================================
- Hits 27479 27477 -2
- Misses 26831 26843 +12
- Partials 1780 1781 +1
| Flag | Coverage Δ | |
|---|---|---|
| unittests | 48.97% <0.00%> (-0.02%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
/assign
@zhzhuang-zju , thanks for this work. Generally looks good to me. Can you confirm we can still run multiple Karmada instances in the same namespace without any issues?
@zhzhuang-zju , thanks for this work. Generally looks good to me. Can you confirm we can still run multiple Karmada instances in the same namespace without any issues?
Sure, I will do some tests and show you the test results
$ kubectl get karmadas.operator.karmada.io --namespace test
NAME READY AGE
karmada-demo True 3h43m
karmada-demo1 True 3h32m
karmada-demo2 True 3h4m
$ kubectl get pods --namespace test
karmada-demo-aggregated-apiserver-7899df7588-dsq9l 1/1 Running 0 3h28m
karmada-demo-apiserver-7df8997c74-dhrcs 1/1 Running 0 3h28m
karmada-demo-controller-manager-559fbc945c-gmpjf 1/1 Running 0 3h28m
karmada-demo-etcd-0 1/1 Running 0 3h28m
karmada-demo-kube-controller-manager-d647b787-5wnmn 1/1 Running 0 3h28m
karmada-demo-metrics-adapter-7b5677db75-8x5jn 1/1 Running 0 3h28m
karmada-demo-metrics-adapter-7b5677db75-vcc5k 1/1 Running 0 3h28m
karmada-demo-scheduler-c7dd6944-lgb4z 1/1 Running 0 3h28m
karmada-demo-webhook-5d86dcc959-wj4kw 1/1 Running 0 3h28m
karmada-demo1-aggregated-apiserver-69c659b64d-nv5nt 1/1 Running 2 (3h24m ago) 3h24m
karmada-demo1-apiserver-5859c9fcd-c66w5 1/1 Running 0 3h24m
karmada-demo1-controller-manager-75fbfdfdb9-d67pg 1/1 Running 0 3h24m
karmada-demo1-etcd-0 1/1 Running 0 3h24m
karmada-demo1-kube-controller-manager-7c894c75c8-r99b2 1/1 Running 0 3h24m
karmada-demo1-metrics-adapter-7cd9695769-bb9j4 1/1 Running 0 3h24m
karmada-demo1-metrics-adapter-7cd9695769-j45tp 1/1 Running 0 3h24m
karmada-demo1-scheduler-774c45f99d-xx9jz 1/1 Running 0 3h24m
karmada-demo1-webhook-694757d499-m888r 1/1 Running 0 3h24m
karmada-demo2-aggregated-apiserver-84775d48c7-zvvzc 1/1 Running 2 (176m ago) 176m
karmada-demo2-apiserver-774974ffcd-spvkm 1/1 Running 0 176m
karmada-demo2-controller-manager-5b4fd69c89-k27rc 1/1 Running 0 176m
karmada-demo2-etcd-0 1/1 Running 0 176m
karmada-demo2-kube-controller-manager-5b9b4fd5f6-x2j7s 1/1 Running 0 176m
karmada-demo2-metrics-adapter-7cfcd45b88-dwwp8 1/1 Running 0 176m
karmada-demo2-metrics-adapter-7cfcd45b88-kwf9d 1/1 Running 0 176m
karmada-demo2-scheduler-579c695575-nt8dg 1/1 Running 0 176m
karmada-demo2-webhook-6b44859c96-6j4bq 1/1 Running 0 176m
Three Karmada instances have been deployed under the namespace test. Among them, the karmada-demo instance and the karmada-operator were upgraded from v1.13.0 together, while the other two instances were deployed after the karmada-operator was upgraded.
$ kubectl --kubeconfig karmada-apiserver.config get deployments.apps
NAME READY UP-TO-DATE AVAILABLE AGE
nginx 0/2 0 0 170m
$ kubectl --kubeconfig karmada-apiserver1.config get deployments.apps
NAME READY UP-TO-DATE AVAILABLE AGE
my-nginx 0/2 0 0 3h23m
$ kubectl --kubeconfig karmada-apiserver2.config get deployments.apps
No resources found in default namespace.
@jabellard
When validating locally, I found a strange phenomenon. During the installation process, the component aggregated-apiserver will restart several times. The log is as follows:
I0625 06:55:49.520045 1 options.go:114] karmada-aggregated-apiserver version: version.Info{GitVersion:"v1.14.0", GitCommit:"42dfade3feef1d53928e6227c3d2a25a75b1a2b5", GitShortCommit:"42dfade3f", GitTreeState:"clean", BuildDate:"2025-05-30T12:36:11Z", GoVersion:"go1.23.8", Compiler:"gc", Platform:"linux/amd64"}
E0625 06:55:50.960763 1 run.go:72] "command failed" err="unable to load configmap based request-header-client-ca-file: Get \"https://172.18.0.6:32266/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication\": dial tcp 172.18.0.6:32266: connect: connection refused"
However, the same situation occurred when I used the karmada-operator of v1.14.0.
@jabellard Have you encountered this situation before?
Updated: I found the reason why karmada-aggregated-apiserver will restart during the installation process, and I'll fix it in another pr
Ref to #6492
Three Karmada instances have been deployed under the namespace
test. Among them, thekarmada-demoinstance and thekarmada-operatorwere upgraded fromv1.13.0together, while the other two instances were deployed after thekarmada-operatorwas upgraded.
@zhzhuang-zju , thanks for confirming.
When validating locally, I found a strange phenomenon. During the installation process, the component aggregated-apiserver will restart several times. The log is as follows:
I0625 06:55:49.520045 1 options.go:114] karmada-aggregated-apiserver version: version.Info{GitVersion:"v1.14.0", GitCommit:"42dfade3feef1d53928e6227c3d2a25a75b1a2b5", GitShortCommit:"42dfade3f", GitTreeState:"clean", BuildDate:"2025-05-30T12:36:11Z", GoVersion:"go1.23.8", Compiler:"gc", Platform:"linux/amd64"} E0625 06:55:50.960763 1 run.go:72] "command failed" err="unable to load configmap based request-header-client-ca-file: Get \"https://172.18.0.6:32266/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication\": dial tcp 172.18.0.6:32266: connect: connection refused"However, the same situation occurred when I used the karmada-operator of v1.14.0.
@jabellard Have you encountered this situation before?
Updated: I found the reason why karmada-aggregated-apiserver will restart during the installation process, and I'll fix it in another pr
Ref to #6492
Thanks. I'm going to take a look.
/lgtm
@RainbowMango , would you like to take another look?
@zhzhuang-zju Sorry for missing this. It seems we haven't reached an agreement about it, right? Based on discussion at https://github.com/karmada-io/karmada/pull/6483#discussion_r2174582152.
It seems we haven't reached an agreement about it, right? Based on discussion at https://github.com/karmada-io/karmada/pull/6483#discussion_r2174582152.
Yes, the main disagreement lies in how to handle this upgrade-incompatible change. Broadly speaking, we have the following options:
-
Option 1: Keep the selector unchanged during upgrade (i.e., as implemented in this PR).
Pros: Users are not impacted.
Cons: The deprecated label remains in place. -
Option 2: During upgrade, if the failure is due to an immutable field update, delete the old workload first and then create a new one. (automatically)
-
Option 3: When an upgrade fails, let the user manually recreate the workload. (manually)
Currently, I’m leaning toward Option 3. In this case, when an upgrade fails, Karmada CR’s conditions correctly reflect the failure reason.
conditions:
- lastTransitionTime: "2025-12-08T03:09:18Z"
message: 'failed to install etcd component, err: error when creating Etcd statefulset,
err: StatefulSet.apps "karmada-demo-etcd" is invalid: spec: Forbidden: updates
to statefulset spec for fields other than ''replicas'', ''ordinals'', ''template'',
''updateStrategy'', ''revisionHistoryLimit'', ''persistentVolumeClaimRetentionPolicy''
and ''minReadySeconds'' are forbidden'
reason: Failed
status: "False"
type: Ready
However, we should update our release notes to treat this as an urgent upgrade requirement, not merely a bug fix.
However, we should update our release notes to treat this as an urgent upgrade requirement, not merely a bug fix.
For example:
When upgrading `karmada-operator` from v1.13.0 to v1.14.0, the label selector used by the operator for Karmada control
plane components has changed—from `karmada-app` to the standard labels `app.kubernetes.io/instance` and `app.kubernetes.io/name`.
Since Kubernetes does not allow updating the `spec.selector` of existing resources, please manually delete the existing
Karmada control plane components (such as `karmada-apiserver`, `etcd`, etc.) after the upgrade. The operator will automatically
recreate them with the new label selectors.