karmada
karmada copied to clipboard
Member cluster healthy checking does not work
Please provide an in-depth description of the question you have: After registering member cluster to karmada with push mode, and using "kubectl get cluster", found the cluster status was ready. Then disconnect member by firewall, after more than 10 minutes, the cluster status was also ready, not change to fail. Are there configurations needed for cluster heathy checking ? What do you think about this question?:
Environment:
- Karmada version: 1.3.0
- Kubernetes version: 1.23.4
- Others:
Are there configurations needed for cluster heathy checking ?
No, Karmada would take care of the cluster status as per heart beat.
@jwcesign Could you please help to confirm this? I guess we can try to reproduce it by following steps:
- launch Karmada by command:
hack/local-up-karmada.sh
- wait for cluster status becomes ready:
kubectl get clusters
- delete cluster
member1
by command:kind delete cluster --name member1
(to simulate the network broken) - wait for cluster status changes:
watch kubectl get clusters
I test with release-1.3
branch, it looks work:
jw@ecs-3fa1 [03:04:27 PM] [~/workspace/git/karmada-diff/karmada-official] [release-1.3 *]
-> % export KUBECONFIG=/home/jw/.kube/karmada.config
jw@ecs-3fa1 [03:04:38 PM] [~/workspace/git/karmada-diff/karmada-official] [release-1.3 *]
-> % kubectl get clusters
NAME VERSION MODE READY AGE
jw-member1-push v1.23.4 Push True 48s
jw-member2-push v1.23.4 Push True 43s
jw-member3-pull v1.23.4 Pull True 33s
jw@ecs-3fa1 [03:04:39 PM] [~/workspace/git/karmada-diff/karmada-official] [release-1.3 *]
-> % kind get clusters
jw-karmada-host
jw-member1-push
jw-member2-push
jw-member3-pull
karmada-host
member1
member2
member3
jw@ecs-3fa1 [03:05:07 PM] [~/workspace/git/karmada-diff/karmada-official] [release-1.3 *]
-> % kind delete cluster --name jw-member1-push
Deleting cluster "jw-member1-push" ...
jw@ecs-3fa1 [03:05:21 PM] [~/workspace/git/karmada-diff/karmada-official] [release-1.3 *]
-> % watch kubectl get clusters
jw@ecs-3fa1 [03:05:38 PM] [~/workspace/git/karmada-diff/karmada-official] [release-1.3 *]
-> % kubectl get clusters
NAME VERSION MODE READY AGE
jw-member1-push v1.23.4 Push True 112s
jw-member2-push v1.23.4 Push True 107s
jw-member3-pull v1.23.4 Pull True 97s
jw@ecs-3fa1 [03:05:43 PM] [~/workspace/git/karmada-diff/karmada-official] [release-1.3 *]
-> % kubectl get clusters --watch
NAME VERSION MODE READY AGE
jw-member1-push v1.23.4 Push True 115s
jw-member2-push v1.23.4 Push True 110s
jw-member3-pull v1.23.4 Pull True 100s
jw-member1-push v1.23.4 Push False 2m39s
jw-member1-push v1.23.4 Push False 2m39s
jw-member1-push v1.23.4 Push False 3m9s
cc @RainbowMango
I had the same problem once, the cause of the problem was that the firewall did not close the already existing TCP connection.
After you start the firewall, you can use the tcpkill
command to close the tcp connection.
tcpkill -9 -i ens192 src host 10.70.4.241 and dst port 6443
@jwcesign Please use v1.3.0 and try again.
@alex-wong123 As far as I remember, we didn't change the health detect behavior since v1.3.0, which means the testing of @jwcesign is believable.
Thanks @chaunceyjiang for your information, that's probably the truth :)
Same result, so I think @chaunceyjiang 's answer is right.
To start using your karmada, run:
export KUBECONFIG=/home/jw/.kube/karmada.config
Please use 'kubectl config use-context jw-karmada-host/karmada-apiserver' to switch the host and control plane cluster.
To manage your member clusters, run:
export KUBECONFIG=/home/jw/.kube/members.config
Please use 'kubectl config use-context jw-member1-push/jw-member2-push/jw-member3-pull' to switch to the different member cluster.
jw@ecs-3fa1 [03:32:15 PM] [~/workspace/git/karmada-diff/karmada-official] [12e8f01d *]
-> % kubectl get clusters
NAME VERSION MODE READY AGE
jw-member1-push v1.23.4 Push True 39s
jw-member2-push v1.23.4 Push True 34s
jw-member3-pull v1.23.4 Pull True 7s
jw@ecs-3fa1 [03:32:21 PM] [~/workspace/git/karmada-diff/karmada-official] [12e8f01d *]
-> % kind get clusters
jw-karmada-host
jw-member1-push
jw-member2-push
jw-member3-pull
karmada-host
member1
member2
member3
jw@ecs-3fa1 [03:32:29 PM] [~/workspace/git/karmada-diff/karmada-official] [12e8f01d *]
-> % kind delete cluster --name jw-member1-push
Deleting cluster "jw-member1-push" ...
jw@ecs-3fa1 [03:32:33 PM] [~/workspace/git/karmada-diff/karmada-official] [12e8f01d *]
-> % kubectl get clusters --watch
NAME VERSION MODE READY AGE
jw-member1-push v1.23.4 Push True 57s
jw-member2-push v1.23.4 Push True 52s
jw-member3-pull v1.23.4 Pull True 25s
jw-member1-push v1.23.4 Push False 117s
jw-member1-push v1.23.4 Push False 117s
So @alex-wong123 Can you help to try again according to @chaunceyjiang's recommendation above?
Thanks for all the replies, I'll try according to @chaunceyjiang's recommendation.
Thanks everyone, it works according to @chaunceyjiang's recommendation.
Hi, @RainbowMango @alex-wong123 I suggest that we should incubate a Known issues
. like metrics-server KNOWN_ISSUES.
The current issue is a good example.
Yes!! Where should we put it, any suggestions?
By the way, what's the difference between FAQ and known-issues?
@chaunceyjiang Good idea
cc @Poor12 for suggestions.
Yes!! Where should we put it, any suggestions?
By the way, what's the difference between FAQ and known-issues?
My personal opinions that FAQ is generally about concepts, known-issues are related issues encountered in using.
My personal opinions that FAQ is generally about concepts, known-issues are related issues encountered in using.
+1
Maybe these are a good reference.
kind: https://kind.sigs.k8s.io/docs/user/known-issues/ https://github.com/kubernetes-sigs/kind/blob/main/site/content/docs/user/known-issues.md
metrics-server: https://github.com/kubernetes-sigs/metrics-server/blob/master/KNOWN_ISSUES.md
metallb https://metallb.universe.tf/configuration/calico/
Yeah, I suggest to put it in https://karmada.io/docs/troubleshooting/. We can provide a list and record the corresponding evasion methods, just like @chaunceyjiang mentioned.
troubleshooting
sounds good to me.
@chaunceyjiang how do you think? and would like to send a PR for this?
ok
/reopen /assign @chaunceyjiang Thanks.
@RainbowMango: Reopened this issue.
In response to this:
/reopen /assign @chaunceyjiang Thanks.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.