karmada
karmada copied to clipboard
fix kube controller manager restart all the time
Signed-off-by: calvin0327 [email protected]
What type of PR is this? /kind bug
What this PR does / why we need it: when we install karmada with helm chart. the kube controller manager restart all the time. and open describe message: the liveness report message: Get "http://127.0.0.1:10257/healthz": dial tcp 127.0.0.1:10257: connect: connection refused.

we need open secure port 10257
to kube-controller-manager manifest:
Which issue(s) this PR fixes: Fixes # https://github.com/karmada-io/karmada/issues/2110
Special notes for your reviewer: Please correct me if not.
Does this PR introduce a user-facing change?:
NONE
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by:
To complete the pull request process, please assign pidb after the PR has been reviewed.
You can assign the PR to them by writing /assign @pidb
in a comment when ready.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve
in a comment
Approvers can cancel approval by writing /approve cancel
in a comment
@Poor12 I reproduced the problem and resolve it. PTAL
It seem not be solved. but this issue is exist exactly. I'm continuing to work on it
Hi, what is the status of this issue?
I have the same issue "kube controller manager restart all the time" with karmada version v1.3.0 and v1.3.1. My deployments (simple nginx webserver deployments) take a lot of time to be deployed/undeployed (around 5-6 min).
To try to debug/fix the issue, I add manually "--secure-port=10257" to files "artifacts/deploy/kube-controller-manager.yaml" and "charts/karmada/templates/kube-controller-manager.yaml" before the installation of karmada with hack/remote-up-karmada.sh but the issue persists.
But if I remove the loopback address 127.0.0.1 from livenessProbe: httpGet: host: (in file "artifacts/deploy/kube-controller-manager.yaml") , then the issue (Restarts + CrashLoopBackOff status) seems to dissapear (no more restarts nor CrashLoopBackOff of karmada-kube-controller-manager)
/assign Sorry I missed this PR. will take a look later.
/assign Sorry I missed this PR. will take a look later.

Ok, Today, we finally figured out the reason of the problem, if we set a value to field host
of readiness probe, it represent the host network neither container network and then the kubelet always not to access the API /healthz
.
so, we should not to set any value to host
, It is the container IP by default.
@RainbowMango @Poor12 @carlory @mrequena
the container listens on 127.0.0.1 and the Pod's hostNetwork field is true. Then host, under httpGet, should be set to 127.0.0.1
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#http-probes
Good job!
I have a question, why it can't be reproduced by hack/local-up-karmada.sh
?
@calvin0327 Please add a release note, I think we need to cherry-pick this fix to release branches(at least release-1.3, release-1.2).
Good job! I have a question, why it can't be reproduced by
hack/local-up-karmada.sh
?
Good question, I have not to observe the case after installing karmada with the script hack/local-up-karmada.sh
. I think it also have the problem.
I will test it tomorrow.
I'm asking because I can't find the restart from it:
# kubectl get pods -n karmada-system
NAME READY STATUS RESTARTS AGE
etcd-0 1/1 Running 0 18h
karmada-aggregated-apiserver-7d86576949-4npts 1/1 Running 0 18h
karmada-aggregated-apiserver-7d86576949-b7l7m 1/1 Running 0 18h
karmada-apiserver-547489884f-w9l54 1/1 Running 0 18h
karmada-controller-manager-666fc5456f-2v9km 1/1 Running 0 18h
karmada-controller-manager-666fc5456f-49476 1/1 Running 0 18h
karmada-descheduler-79ffc968d9-bgsn5 1/1 Running 0 18h
karmada-descheduler-79ffc968d9-nfv7z 1/1 Running 0 18h
karmada-kube-controller-manager-644d9fccf9-4tw5q 1/1 Running 0 18h
karmada-scheduler-64fcc487fd-8nkj9 1/1 Running 0 18h
karmada-scheduler-64fcc487fd-9p4kz 1/1 Running 0 18h
karmada-scheduler-estimator-member1-5fbdf58686-4ljnk 1/1 Running 0 18h
karmada-scheduler-estimator-member1-5fbdf58686-rcmls 1/1 Running 0 18h
karmada-scheduler-estimator-member2-6bc6d857b8-pksdl 1/1 Running 0 18h
karmada-scheduler-estimator-member2-6bc6d857b8-zwvbs 1/1 Running 0 18h
karmada-scheduler-estimator-member3-7d9cdcbb8b-d7npp 1/1 Running 0 18h
karmada-scheduler-estimator-member3-7d9cdcbb8b-gt2h7 1/1 Running 0 18h
karmada-search-58bc795c46-q6l8p 1/1 Running 0 18h
karmada-search-58bc795c46-s4lz8 1/1 Running 0 18h
karmada-webhook-7674f5cfdd-dz8mv 1/1 Running 0 18h
karmada-webhook-7674f5cfdd-tfv6g 1/1 Running 0 18h
@RainbowMango @calvin0327
when we use the hack/local-up-karmada.sh
to deploy karmada, the karmada components run on the cluster which only has one node. kubelet does health check for the kube-controller-manager which is deployed by this script, and it will get a healthy result. I guess that kubelet sends the request to the kube-controller-manager of the host cluster.
Thanks @carlory , that makes sense.
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: RainbowMango
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [RainbowMango]
Approvers can indicate their approval by writing /approve
in a comment
Approvers can cancel approval by writing /approve cancel
in a comment
@RainbowMango ok, cherry pick to branch v1.2.0 and v1.3.0?
I checked the history, and only release-1.3 need it. This issue was brought up by #1935 which was only in release-1.3.