karmada Member cluster healthy checking does not work

Please provide an in-depth description of the question you have: After registering member cluster to karmada with push mode, and using "kubectl get cluster", found the cluster status was ready. Then disconnect member by firewall, after more than 10 minutes, the cluster status was also ready, not change to fail. Are there configurations needed for cluster heathy checking ? What do you think about this question?:

Environment:

Karmada version: 1.3.0
Kubernetes version: 1.23.4
Others:

Sep 23 '22 02:09 alex-wong123

Are there configurations needed for cluster heathy checking ?

No, Karmada would take care of the cluster status as per heart beat.

@jwcesign Could you please help to confirm this? I guess we can try to reproduce it by following steps:

launch Karmada by command: hack/local-up-karmada.sh
wait for cluster status becomes ready: kubectl get clusters
delete cluster member1 by command: kind delete cluster --name member1 (to simulate the network broken)
wait for cluster status changes: watch kubectl get clusters

Sep 23 '22 03:09 RainbowMango

I test with release-1.3 branch, it looks work:

jw@ecs-3fa1 [03:04:27 PM] [~/workspace/git/karmada-diff/karmada-official] [release-1.3 *]
-> % export KUBECONFIG=/home/jw/.kube/karmada.config
jw@ecs-3fa1 [03:04:38 PM] [~/workspace/git/karmada-diff/karmada-official] [release-1.3 *]
-> % kubectl get clusters
NAME              VERSION   MODE   READY   AGE
jw-member1-push   v1.23.4   Push   True    48s
jw-member2-push   v1.23.4   Push   True    43s
jw-member3-pull   v1.23.4   Pull   True    33s
jw@ecs-3fa1 [03:04:39 PM] [~/workspace/git/karmada-diff/karmada-official] [release-1.3 *]
-> % kind get clusters
jw-karmada-host
jw-member1-push
jw-member2-push
jw-member3-pull
karmada-host
member1
member2
member3
jw@ecs-3fa1 [03:05:07 PM] [~/workspace/git/karmada-diff/karmada-official] [release-1.3 *]
-> % kind delete cluster --name jw-member1-push
Deleting cluster "jw-member1-push" ...
jw@ecs-3fa1 [03:05:21 PM] [~/workspace/git/karmada-diff/karmada-official] [release-1.3 *]
-> % watch kubectl get clusters
jw@ecs-3fa1 [03:05:38 PM] [~/workspace/git/karmada-diff/karmada-official] [release-1.3 *]
-> % kubectl get clusters
NAME              VERSION   MODE   READY   AGE
jw-member1-push   v1.23.4   Push   True    112s
jw-member2-push   v1.23.4   Push   True    107s
jw-member3-pull   v1.23.4   Pull   True    97s
jw@ecs-3fa1 [03:05:43 PM] [~/workspace/git/karmada-diff/karmada-official] [release-1.3 *]
-> % kubectl get clusters --watch
NAME              VERSION   MODE   READY   AGE
jw-member1-push   v1.23.4   Push   True    115s
jw-member2-push   v1.23.4   Push   True    110s
jw-member3-pull   v1.23.4   Pull   True    100s
jw-member1-push   v1.23.4   Push   False   2m39s
jw-member1-push   v1.23.4   Push   False   2m39s
jw-member1-push   v1.23.4   Push   False   3m9s

cc @RainbowMango

Sep 23 '22 07:09 jwcesign

I had the same problem once, the cause of the problem was that the firewall did not close the already existing TCP connection.

After you start the firewall, you can use the tcpkill command to close the tcp connection.

tcpkill -9  -i ens192 src host 10.70.4.241 and dst port 6443

Sep 23 '22 07:09 chaunceyjiang

@jwcesign Please use v1.3.0 and try again.

@alex-wong123 As far as I remember, we didn't change the health detect behavior since v1.3.0, which means the testing of @jwcesign is believable.

Thanks @chaunceyjiang for your information, that's probably the truth :)

Sep 23 '22 07:09 RainbowMango

Same result, so I think @chaunceyjiang 's answer is right.

To start using your karmada, run:
  export KUBECONFIG=/home/jw/.kube/karmada.config
Please use 'kubectl config use-context jw-karmada-host/karmada-apiserver' to switch the host and control plane cluster.

To manage your member clusters, run:
  export KUBECONFIG=/home/jw/.kube/members.config
Please use 'kubectl config use-context jw-member1-push/jw-member2-push/jw-member3-pull' to switch to the different member cluster.
jw@ecs-3fa1 [03:32:15 PM] [~/workspace/git/karmada-diff/karmada-official] [12e8f01d *]
-> % kubectl get clusters
NAME              VERSION   MODE   READY   AGE
jw-member1-push   v1.23.4   Push   True    39s
jw-member2-push   v1.23.4   Push   True    34s
jw-member3-pull   v1.23.4   Pull   True    7s
jw@ecs-3fa1 [03:32:21 PM] [~/workspace/git/karmada-diff/karmada-official] [12e8f01d *]
-> % kind get clusters
jw-karmada-host
jw-member1-push
jw-member2-push
jw-member3-pull
karmada-host
member1
member2
member3
jw@ecs-3fa1 [03:32:29 PM] [~/workspace/git/karmada-diff/karmada-official] [12e8f01d *]
-> % kind delete cluster --name jw-member1-push
Deleting cluster "jw-member1-push" ...
jw@ecs-3fa1 [03:32:33 PM] [~/workspace/git/karmada-diff/karmada-official] [12e8f01d *]
-> % kubectl get clusters --watch
NAME              VERSION   MODE   READY   AGE
jw-member1-push   v1.23.4   Push   True    57s
jw-member2-push   v1.23.4   Push   True    52s
jw-member3-pull   v1.23.4   Pull   True    25s
jw-member1-push   v1.23.4   Push   False   117s
jw-member1-push   v1.23.4   Push   False   117s

Sep 23 '22 07:09 jwcesign

So @alex-wong123 Can you help to try again according to @chaunceyjiang's recommendation above?

Sep 23 '22 08:09 RainbowMango

Thanks for all the replies, I'll try according to @chaunceyjiang's recommendation.

Sep 23 '22 08:09 alex-wong123

Thanks everyone, it works according to @chaunceyjiang's recommendation.

Sep 23 '22 09:09 alex-wong123

Hi, @RainbowMango @alex-wong123 I suggest that we should incubate a Known issues. like metrics-server KNOWN_ISSUES. The current issue is a good example.

Sep 23 '22 09:09 chaunceyjiang

Yes!! Where should we put it, any suggestions?

By the way, what's the difference between FAQ and known-issues?

Sep 23 '22 09:09 RainbowMango

@chaunceyjiang Good idea

Sep 23 '22 09:09 alex-wong123

cc @Poor12 for suggestions.

Sep 23 '22 09:09 RainbowMango

Yes!! Where should we put it, any suggestions?

By the way, what's the difference between FAQ and known-issues?

My personal opinions that FAQ is generally about concepts, known-issues are related issues encountered in using.

Sep 23 '22 09:09 alex-wong123

My personal opinions that FAQ is generally about concepts, known-issues are related issues encountered in using.

+1

Maybe these are a good reference.

kind: https://kind.sigs.k8s.io/docs/user/known-issues/ https://github.com/kubernetes-sigs/kind/blob/main/site/content/docs/user/known-issues.md

metrics-server: https://github.com/kubernetes-sigs/metrics-server/blob/master/KNOWN_ISSUES.md

metallb https://metallb.universe.tf/configuration/calico/

Sep 23 '22 09:09 chaunceyjiang

Yeah, I suggest to put it in https://karmada.io/docs/troubleshooting/. We can provide a list and record the corresponding evasion methods, just like @chaunceyjiang mentioned.

Sep 23 '22 09:09 Poor12

troubleshooting sounds good to me. @chaunceyjiang how do you think? and would like to send a PR for this?

Sep 23 '22 09:09 RainbowMango

ok

Sep 23 '22 09:09 chaunceyjiang

/reopen /assign @chaunceyjiang Thanks.

Sep 24 '22 01:09 RainbowMango

@RainbowMango: Reopened this issue.

In response to this:

/reopen /assign @chaunceyjiang Thanks.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sep 24 '22 01:09 karmada-bot