karmada fix kube controller manager restart all the time

Signed-off-by: calvin0327 [email protected]

What type of PR is this? /kind bug

What this PR does / why we need it: when we install karmada with helm chart. the kube controller manager restart all the time. and open describe message: the liveness report message: Get "http://127.0.0.1:10257/healthz": dial tcp 127.0.0.1:10257: connect: connection refused.

we need open secure port 10257 to kube-controller-manager manifest:

Which issue(s) this PR fixes: Fixes # https://github.com/karmada-io/karmada/issues/2110

Special notes for your reviewer: Please correct me if not.

Does this PR introduce a user-facing change?:

NONE

Jul 28 '22 15:07 calvin0327

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: To complete the pull request process, please assign pidb after the PR has been reviewed. You can assign the PR to them by writing /assign @pidb in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

charts/OWNERS

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

Jul 28 '22 15:07 karmada-bot

@Poor12 I reproduced the problem and resolve it. PTAL

Jul 28 '22 15:07 calvin0327

It seem not be solved. but this issue is exist exactly. I'm continuing to work on it

Jul 30 '22 12:07 calvin0327

Hi, what is the status of this issue?

I have the same issue "kube controller manager restart all the time" with karmada version v1.3.0 and v1.3.1. My deployments (simple nginx webserver deployments) take a lot of time to be deployed/undeployed (around 5-6 min).

To try to debug/fix the issue, I add manually "--secure-port=10257" to files "artifacts/deploy/kube-controller-manager.yaml" and "charts/karmada/templates/kube-controller-manager.yaml" before the installation of karmada with hack/remote-up-karmada.sh but the issue persists.

But if I remove the loopback address 127.0.0.1 from livenessProbe: httpGet: host: (in file "artifacts/deploy/kube-controller-manager.yaml") , then the issue (Restarts + CrashLoopBackOff status) seems to dissapear (no more restarts nor CrashLoopBackOff of karmada-kube-controller-manager)

Nov 11 '22 15:11 mrequena

/assign Sorry I missed this PR. will take a look later.

Nov 12 '22 02:11 RainbowMango

/assign Sorry I missed this PR. will take a look later.

Ok, Today, we finally figured out the reason of the problem, if we set a value to field host of readiness probe, it represent the host network neither container network and then the kubelet always not to access the API /healthz.

so, we should not to set any value to host, It is the container IP by default.

@RainbowMango @Poor12 @carlory @mrequena

Nov 14 '22 14:11 calvin0327

the container listens on 127.0.0.1 and the Pod's hostNetwork field is true. Then host, under httpGet, should be set to 127.0.0.1

https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#http-probes

Nov 14 '22 15:11 chaunceyjiang

Good job! I have a question, why it can't be reproduced by hack/local-up-karmada.sh?

Nov 15 '22 07:11 RainbowMango

@calvin0327 Please add a release note, I think we need to cherry-pick this fix to release branches(at least release-1.3, release-1.2).

Nov 15 '22 07:11 RainbowMango

Good job! I have a question, why it can't be reproduced by hack/local-up-karmada.sh?

Good question, I have not to observe the case after installing karmada with the script hack/local-up-karmada.sh. I think it also have the problem.

I will test it tomorrow.

Nov 15 '22 14:11 calvin0327

I'm asking because I can't find the restart from it:

# kubectl get pods -n karmada-system 
NAME                                                   READY   STATUS    RESTARTS   AGE
etcd-0                                                 1/1     Running   0          18h
karmada-aggregated-apiserver-7d86576949-4npts          1/1     Running   0          18h
karmada-aggregated-apiserver-7d86576949-b7l7m          1/1     Running   0          18h
karmada-apiserver-547489884f-w9l54                     1/1     Running   0          18h
karmada-controller-manager-666fc5456f-2v9km            1/1     Running   0          18h
karmada-controller-manager-666fc5456f-49476            1/1     Running   0          18h
karmada-descheduler-79ffc968d9-bgsn5                   1/1     Running   0          18h
karmada-descheduler-79ffc968d9-nfv7z                   1/1     Running   0          18h
karmada-kube-controller-manager-644d9fccf9-4tw5q       1/1     Running   0          18h
karmada-scheduler-64fcc487fd-8nkj9                     1/1     Running   0          18h
karmada-scheduler-64fcc487fd-9p4kz                     1/1     Running   0          18h
karmada-scheduler-estimator-member1-5fbdf58686-4ljnk   1/1     Running   0          18h
karmada-scheduler-estimator-member1-5fbdf58686-rcmls   1/1     Running   0          18h
karmada-scheduler-estimator-member2-6bc6d857b8-pksdl   1/1     Running   0          18h
karmada-scheduler-estimator-member2-6bc6d857b8-zwvbs   1/1     Running   0          18h
karmada-scheduler-estimator-member3-7d9cdcbb8b-d7npp   1/1     Running   0          18h
karmada-scheduler-estimator-member3-7d9cdcbb8b-gt2h7   1/1     Running   0          18h
karmada-search-58bc795c46-q6l8p                        1/1     Running   0          18h
karmada-search-58bc795c46-s4lz8                        1/1     Running   0          18h
karmada-webhook-7674f5cfdd-dz8mv                       1/1     Running   0          18h
karmada-webhook-7674f5cfdd-tfv6g                       1/1     Running   0          18h

Nov 16 '22 01:11 RainbowMango

@RainbowMango @calvin0327

when we use the hack/local-up-karmada.sh to deploy karmada, the karmada components run on the cluster which only has one node. kubelet does health check for the kube-controller-manager which is deployed by this script, and it will get a healthy result. I guess that kubelet sends the request to the kube-controller-manager of the host cluster.

Nov 16 '22 09:11 carlory

Thanks @carlory , that makes sense.

Nov 16 '22 09:11 RainbowMango

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [RainbowMango]

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

Nov 16 '22 09:11 karmada-bot

@RainbowMango ok, cherry pick to branch v1.2.0 and v1.3.0?

Nov 16 '22 11:11 calvin0327

I checked the history, and only release-1.3 need it. This issue was brought up by #1935 which was only in release-1.3.

Nov 16 '22 12:11 RainbowMango

karmada karmada copied to clipboard

fix kube controller manager restart all the time

karmada
karmada copied to clipboard