karmada icon indicating copy to clipboard operation
karmada copied to clipboard

fix kube controller manager restart all the time

Open calvin0327 opened this issue 2 years ago • 3 comments

Signed-off-by: calvin0327 [email protected]

What type of PR is this? /kind bug

What this PR does / why we need it: when we install karmada with helm chart. the kube controller manager restart all the time. and open describe message: the liveness report message: Get "http://127.0.0.1:10257/healthz": dial tcp 127.0.0.1:10257: connect: connection refused.

image

we need open secure port 10257 to kube-controller-manager manifest: image

Which issue(s) this PR fixes: Fixes # https://github.com/karmada-io/karmada/issues/2110

Special notes for your reviewer: Please correct me if not.

Does this PR introduce a user-facing change?:

NONE

calvin0327 avatar Jul 28 '22 15:07 calvin0327

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: To complete the pull request process, please assign pidb after the PR has been reviewed. You can assign the PR to them by writing /assign @pidb in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

karmada-bot avatar Jul 28 '22 15:07 karmada-bot

@Poor12 I reproduced the problem and resolve it. PTAL

calvin0327 avatar Jul 28 '22 15:07 calvin0327

It seem not be solved. but this issue is exist exactly. I'm continuing to work on it

calvin0327 avatar Jul 30 '22 12:07 calvin0327

Hi, what is the status of this issue?

I have the same issue "kube controller manager restart all the time" with karmada version v1.3.0 and v1.3.1. My deployments (simple nginx webserver deployments) take a lot of time to be deployed/undeployed (around 5-6 min).

To try to debug/fix the issue, I add manually "--secure-port=10257" to files "artifacts/deploy/kube-controller-manager.yaml" and "charts/karmada/templates/kube-controller-manager.yaml" before the installation of karmada with hack/remote-up-karmada.sh but the issue persists.

But if I remove the loopback address 127.0.0.1 from livenessProbe: httpGet: host: (in file "artifacts/deploy/kube-controller-manager.yaml") , then the issue (Restarts + CrashLoopBackOff status) seems to dissapear (no more restarts nor CrashLoopBackOff of karmada-kube-controller-manager)

mrequena avatar Nov 11 '22 15:11 mrequena

/assign Sorry I missed this PR. will take a look later.

RainbowMango avatar Nov 12 '22 02:11 RainbowMango

/assign Sorry I missed this PR. will take a look later.

image

Ok, Today, we finally figured out the reason of the problem, if we set a value to field host of readiness probe, it represent the host network neither container network and then the kubelet always not to access the API /healthz.

so, we should not to set any value to host, It is the container IP by default.

@RainbowMango @Poor12 @carlory @mrequena

calvin0327 avatar Nov 14 '22 14:11 calvin0327

the container listens on 127.0.0.1 and the Pod's hostNetwork field is true. Then host, under httpGet, should be set to 127.0.0.1

https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#http-probes

chaunceyjiang avatar Nov 14 '22 15:11 chaunceyjiang

Good job! I have a question, why it can't be reproduced by hack/local-up-karmada.sh?

RainbowMango avatar Nov 15 '22 07:11 RainbowMango

@calvin0327 Please add a release note, I think we need to cherry-pick this fix to release branches(at least release-1.3, release-1.2).

RainbowMango avatar Nov 15 '22 07:11 RainbowMango

Good job! I have a question, why it can't be reproduced by hack/local-up-karmada.sh?

Good question, I have not to observe the case after installing karmada with the script hack/local-up-karmada.sh. I think it also have the problem.

I will test it tomorrow.

calvin0327 avatar Nov 15 '22 14:11 calvin0327

I'm asking because I can't find the restart from it:

# kubectl get pods -n karmada-system 
NAME                                                   READY   STATUS    RESTARTS   AGE
etcd-0                                                 1/1     Running   0          18h
karmada-aggregated-apiserver-7d86576949-4npts          1/1     Running   0          18h
karmada-aggregated-apiserver-7d86576949-b7l7m          1/1     Running   0          18h
karmada-apiserver-547489884f-w9l54                     1/1     Running   0          18h
karmada-controller-manager-666fc5456f-2v9km            1/1     Running   0          18h
karmada-controller-manager-666fc5456f-49476            1/1     Running   0          18h
karmada-descheduler-79ffc968d9-bgsn5                   1/1     Running   0          18h
karmada-descheduler-79ffc968d9-nfv7z                   1/1     Running   0          18h
karmada-kube-controller-manager-644d9fccf9-4tw5q       1/1     Running   0          18h
karmada-scheduler-64fcc487fd-8nkj9                     1/1     Running   0          18h
karmada-scheduler-64fcc487fd-9p4kz                     1/1     Running   0          18h
karmada-scheduler-estimator-member1-5fbdf58686-4ljnk   1/1     Running   0          18h
karmada-scheduler-estimator-member1-5fbdf58686-rcmls   1/1     Running   0          18h
karmada-scheduler-estimator-member2-6bc6d857b8-pksdl   1/1     Running   0          18h
karmada-scheduler-estimator-member2-6bc6d857b8-zwvbs   1/1     Running   0          18h
karmada-scheduler-estimator-member3-7d9cdcbb8b-d7npp   1/1     Running   0          18h
karmada-scheduler-estimator-member3-7d9cdcbb8b-gt2h7   1/1     Running   0          18h
karmada-search-58bc795c46-q6l8p                        1/1     Running   0          18h
karmada-search-58bc795c46-s4lz8                        1/1     Running   0          18h
karmada-webhook-7674f5cfdd-dz8mv                       1/1     Running   0          18h
karmada-webhook-7674f5cfdd-tfv6g                       1/1     Running   0          18h

RainbowMango avatar Nov 16 '22 01:11 RainbowMango

@RainbowMango @calvin0327

when we use the hack/local-up-karmada.sh to deploy karmada, the karmada components run on the cluster which only has one node. kubelet does health check for the kube-controller-manager which is deployed by this script, and it will get a healthy result. I guess that kubelet sends the request to the kube-controller-manager of the host cluster.

carlory avatar Nov 16 '22 09:11 carlory

Thanks @carlory , that makes sense.

RainbowMango avatar Nov 16 '22 09:11 RainbowMango

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

karmada-bot avatar Nov 16 '22 09:11 karmada-bot

@RainbowMango ok, cherry pick to branch v1.2.0 and v1.3.0?

calvin0327 avatar Nov 16 '22 11:11 calvin0327

I checked the history, and only release-1.3 need it. This issue was brought up by #1935 which was only in release-1.3.

RainbowMango avatar Nov 16 '22 12:11 RainbowMango