karmada icon indicating copy to clipboard operation
karmada copied to clipboard

Error monitoring cluster health: no matches for kind "Cluster"

Open levkp opened this issue 1 year ago • 8 comments

What happened:

The controller manager's status becomes CrashLoopBackOff after installing Karmada with remote Helm chart.

What you expected to happen:

All pods in the karmada-system namespace have status Running.

How to reproduce it (as minimally and precisely as possible):

Install Karmada following the remote Helm chart method described in karmada/charts/karmada/README.md. I tried doing this in my personal environment (EKS cluster running on 5 t3.medium EC2 nodes), and in Killercoda. Both gave the same logs.

Anything else we need to know?:

Here is the (I hope all) relevant output of kubectl logs karmada-controller-manager-77f9f77789-dlkt8 -n karmada-system in my personal Kubernetes environment:

I0515 13:56:08.021271       1 detector.go:217] Reconciling object: apiregistration.k8s.io/v1, kind=APIService, v1.autoscaling
I0515 13:56:08.021337       1 detector.go:353] Attempts to match cluster policy for resource(apiregistration.k8s.io/v1, kind=APIService, v1.autoscaling)
I0515 13:56:08.021352       1 detector.go:360] No clusterpropagationpolicy find.
I0515 13:56:08.021419       1 recorder.go:104] "events: No policy match for resource" type="Warning" object={Kind:APIService Namespace: Name:v1.autoscaling UID:22e4e2cf-394b-4c4d-b29b-f16565665433 APIVersion:apiregistration.k8s.io/v1 ResourceVersion:20 FieldPath:} reason="ApplyPolicyFailed"
E0515 13:56:08.037585       1 cluster_controller.go:189] Error monitoring cluster health: no matches for kind "Cluster" in version "cluster.karmada.io/v1alpha1"

and

E0515 13:56:13.137639       1 unified_auth_controller.go:277] Failed to list existing clusters, error: no matches for kind "Cluster" in version "cluster.karmada.io/v1alpha1"
I0515 13:56:14.838066       1 controller.go:219] "Starting workers" controller="resourcebinding" controllerGroup="work.karmada.io" controllerKind="ResourceBinding" worker count=5
[controller-runtime] log.SetLogger(...) was never called, logs will not be displayed:
goroutine 415 [running]:
runtime/debug.Stack()
	/opt/hostedtoolcache/go/1.20.6/x64/src/runtime/debug/stack.go:24 +0x65
sigs.k8s.io/controller-runtime/pkg/log.eventuallyFulfillRoot()
	/home/runner/work/karmada/karmada/vendor/sigs.k8s.io/controller-runtime/pkg/log/log.go:59 +0xbd
sigs.k8s.io/controller-runtime/pkg/log.(*delegatingLogSink).Error(0xc0001f6440, {0x29b4e00, 0xc0018e9800}, {0x2632125, 0x3d}, {0xc0017ae040, 0x2, 0x2})
	/home/runner/work/karmada/karmada/vendor/sigs.k8s.io/controller-runtime/pkg/log/deleg.go:139 +0x68
github.com/go-logr/logr.Logger.Error({{0x29dd340?, 0xc0001f6440?}, 0xc000561660?}, {0x29b4e00, 0xc0018e9800}, {0x2632125, 0x3d}, {0xc0017ae040, 0x2, 0x2})
	/home/runner/work/karmada/karmada/vendor/github.com/go-logr/logr/logr.go:299 +0xda
sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1.1({0x29d65a0?, 0xc000300b90?})
	/home/runner/work/karmada/karmada/vendor/sigs.k8s.io/controller-runtime/pkg/internal/source/kind.go:63 +0x265
k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func1(0xc0010d5e00?, {0x29d65a0?, 0xc000300b90?})
	/home/runner/work/karmada/karmada/vendor/k8s.io/apimachinery/pkg/util/wait/loop.go:62 +0x5d
k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext({0x29d65a0, 0xc000300b90}, {0x29d3ad0?, 0xc000378680}, 0x1, 0x0, 0x22058a0?)
	/home/runner/work/karmada/karmada/vendor/k8s.io/apimachinery/pkg/util/wait/loop.go:63 +0x205
k8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel({0x29d65a0, 0xc000300b90}, 0xc0009204d8?, 0x0?, 0xc0006b6f08?)
	/home/runner/work/karmada/karmada/vendor/k8s.io/apimachinery/pkg/util/wait/poll.go:33 +0x5c
sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1()
	/home/runner/work/karmada/karmada/vendor/sigs.k8s.io/controller-runtime/pkg/internal/source/kind.go:56 +0xfa
created by sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start
	/home/runner/work/karmada/karmada/vendor/sigs.k8s.io/controller-runtime/pkg/internal/source/kind.go:48 +0x1e5

Environment:

  • Karmada version: 1.9.1
  • kubectl-karmada or karmadactl version (the result of kubectl-karmada version or karmadactl version):
kubectl karmada version: version.Info{GitVersion:"v1.9.0", GitCommit:"a03aa846cfd2c1978b166660d6f592a1c10aeb3d", GitTreeState:"clean", BuildDate:"2024-02-29T08:15:22Z", GoVersion:"go1.20.11", Compiler:"gc", Platform:"linux/amd64"}
  • Others:

levkp avatar May 15 '24 14:05 levkp

Install Karmada following the remote Helm chart method described in karmada/charts/karmada/README.md. I tried doing this in my personal environment (EKS cluster running on 5 t3.medium EC2 nodes), and in Killercoda. Both gave the same logs.

@chaosi-zju Can you help to reproduce it on your side?

RainbowMango avatar May 16 '24 01:05 RainbowMango

[controller-runtime] log.SetLogger(...) was never called, logs will not be displayed:

@levkp This log looks like a panic, but it isn't, and it has been fixed on master(see #4855) since it doesn't affect any functionality, so we can ignore this log here.

RainbowMango avatar May 16 '24 01:05 RainbowMango

Hi @levkp, sorry for the late reply (((;꒪ꈊ꒪;))).

First, I want to confirm what version karmada did you installed. As you said:

Environment:

Karmada version: 1.9.1

But may be the latest version is 1.9.0:

$ helm search repo karmada                                                                                     
NAME                            CHART VERSION   APP VERSION     DESCRIPTION                      
karmada-charts/karmada          v1.9.0          latest          A Helm chart for karmada         
karmada-charts/karmada-operator v1.8.0          v1.1.0          A Helm chart for karmada-operator

Then, I installed Karmada v1.9.0 following the remote Helm chart method described in karmada/charts/karmada/README.md too, but I didn't encounter your problem. My installation succeed:

$ kubectl get po -A   
NAMESPACE            NAME                                                 READY   STATUS    RESTARTS        AGE
karmada-system       etcd-0                                               1/1     Running   0               4m33s
karmada-system       karmada-aggregated-apiserver-6bf466fdc4-fv86h        1/1     Running   2 (4m27s ago)   4m33s
karmada-system       karmada-apiserver-756b559f84-qf2td                   1/1     Running   0               4m33s
karmada-system       karmada-controller-manager-7b9f6f5f5-v5bwp           1/1     Running   3 (4m16s ago)   4m33s
karmada-system       karmada-kube-controller-manager-7b6d45cbdf-5kk8d     1/1     Running   2 (4m27s ago)   4m33s
karmada-system       karmada-scheduler-64db5cf5d6-bgd85                   1/1     Running   0               4m33s
karmada-system       karmada-webhook-7b6fc7f575-chqjk                     1/1     Running   0               4m33s

Next, in case of your CrashLoopBackOff error, it is worth noting that the ready running state of the karmada-controller-manager must basing on the ready running state of karmada-apiserver and etcd. How are these two components running? What are the clues if they fail?

At Last, is it necessary for you to use remote helm method? May be we can also try downloading the chart and install it locally? For efficient installation, may be you can refer and try steps mentioned in https://github.com/karmada-io/karmada/issues/4963#issuecomment-2122121193

chaosi-zju avatar May 21 '24 11:05 chaosi-zju

similar problem in progress #4917

chaosi-zju avatar May 21 '24 13:05 chaosi-zju

hi @levkp, can you try install karmada at karmada-system namespace? (do not use other namespace)

chaosi-zju avatar May 21 '24 14:05 chaosi-zju

Hi @chaosi-zju!

sorry for the late reply

No problem, and thanks for looking into this.

First, I want to confirm what version karmada did you installed.

I installed 1.9.0, I wrote 1.9.1 only by mistake.

As you recommended, I followed your installation steps in #4963 after cloning the repo:

$ helm install karmada -n karmada-system   --kubeconfig ~/.kube/config   --create-namespace   --dependency-update   --set apiServer.hostNetwork=true   ./charts/karmada
NAME: karmada
LAST DEPLOYED: Wed May 22 13:23:16 2024
NAMESPACE: karmada-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
$ kubectl get secret -n karmada-system karmada-kubeconfig -o jsonpath={.data.kubeconfig} | base64 -d >~/.kube/karmada-apiserver.config
$ KARMADA_APISERVER_ADDR=$(kubectl get ep karmada-apiserver -n karmada-system | tail -n 1 | awk '{print $2}')
$ echo $KARMADA_APISERVER_ADDR
10.0.4.221:5443
$ kubectl get po -n karmada-system
NAME                                               READY   STATUS             RESTARTS        AGE
etcd-0                                             1/1     Running            0               7m18s
karmada-aggregated-apiserver-79f6bdb5b9-nh2g5      1/1     Running            2 (7m8s ago)    7m18s
karmada-apiserver-5bd55dfcff-k7kz9                 1/1     Running            0               7m18s
karmada-controller-manager-6965d94dc4-646sp        0/1     CrashLoopBackOff   4 (82s ago)     7m18s
karmada-kube-controller-manager-5d4795ff87-cxnlr   1/1     Running            2 (7m10s ago)   7m18s
karmada-scheduler-85bcf46665-7n6xw                 1/1     Running            0               7m18s
karmada-webhook-7bbb7ddb98-9xnlq                   1/1     Running            0               7m18s

Here are the logs again for the controller manager. I see three variations of the error I started this issue with:

E0522 11:32:05.026221       1 kind.go:63] "if kind is a CRD, it should be installed before calling Start" err="no matches for kind \"Cluster\" in version \"cluster.karmada.io/v1alpha1\"" logger="controller-runtime.source.EventHandler" kind="Cluster.cluster.karmada.io"
E0522 11:32:05.116317       1 unified_auth_controller.go:285] Failed to list existing clusters, error: no matches for kind "Cluster" in version "cluster.karmada.io/v1alpha1"
E0522 11:33:05.461726       1 cluster_controller.go:206] Error monitoring cluster health: no matches for kind "Cluster" in version "cluster.karmada.io/v1alpha1"

As you suggested, I looked at the logs of karmada-apiserver and etcd:

$ kubectl logs karmada-apiserver-5bd55dfcff-k7kz9 -n karmada-system | grep E0
E0522 11:37:23.496829       1 controller.go:116] loading OpenAPI spec for "v1beta2.custom.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
E0522 11:37:23.497978       1 controller.go:113] loading OpenAPI spec for "v1beta2.custom.metrics.k8s.io" failed with: Error, could not get list of group versions for APIService
E0522 11:37:27.388740       1 available_controller.go:460] v1alpha1.cluster.karmada.io failed with: failing or missing response from https://karmada-aggregated-apiserver.karmada-system.svc.cluster.local:443/apis/cluster.karmada.io/v1alpha1: Get "https://karmada-aggregated-apiserver.karmada-system.svc.cluster.local:443/apis/cluster.karmada.io/v1alpha1": dial tcp 172.20.208.240:443: i/o timeout (Client.Timeout exceeded while awaiting headers)
E0522 11:37:28.391880       1 controller.go:113] loading OpenAPI spec for "v1alpha1.cluster.karmada.io" failed with: Error, could not get list of group versions for APIService
E0522 11:37:28.391967       1 controller.go:116] loading OpenAPI spec for "v1alpha1.cluster.karmada.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
E0522 11:37:31.278693       1 available_controller.go:460] v1beta2.custom.metrics.k8s.io failed with: failing or missing response from https://karmada-metrics-adapter.karmada-system.svc.cluster.local:443/apis/custom.metrics.k8s.io/v1beta2: Get "https://karmada-metrics-adapter.karmada-system.svc.cluster.local:443/apis/custom.metrics.k8s.io/v1beta2": dial tcp: lookup karmada-metrics-adapter.karmada-system.svc.cluster.local on 172.20.0.10:53: no such host
E0522 11:37:31.278914       1 available_controller.go:460] v1beta1.custom.metrics.k8s.io failed with: failing or missing response from https://karmada-metrics-adapter.karmada-system.svc.cluster.local:443/apis/custom.metrics.k8s.io/v1beta1: Get "https://karmada-metrics-adapter.karmada-system.svc.cluster.local:443/apis/custom.metrics.k8s.io/v1beta1": dial tcp: lookup karmada-metrics-adapter.karmada-system.svc.cluster.local on 172.20.0.10:53: no such host
E0522 11:37:31.279162       1 available_controller.go:460] v1beta1.metrics.k8s.io failed with: failing or missing response from https://karmada-metrics-adapter.karmada-system.svc.cluster.local:443/apis/metrics.k8s.io/v1beta1: Get "https://karmada-metrics-adapter.karmada-system.svc.cluster.local:443/apis/metrics.k8s.io/v1beta1": dial tcp: lookup karmada-metrics-adapter.karmada-system.svc.cluster.local on 172.20.0.10:53: no such host
E0522 11:37:32.399915       1 available_controller.go:460] v1alpha1.cluster.karmada.io failed with: failing or missing response from https://karmada-aggregated-apiserver.karmada-system.svc.cluster.local:443/apis/cluster.karmada.io/v1alpha1: Get "https://karmada-aggregated-apiserver.karmada-system.svc.cluster.local:443/apis/cluster.karmada.io/v1alpha1": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
E0522 11:37:33.401077       1 controller.go:113] loading OpenAPI spec for "v1alpha1.cluster.karmada.io" failed with: Error, could not get list of group versions for APIService
E0522 11:37:33.401727       1 controller.go:116] loading OpenAPI spec for "v1alpha1.cluster.karmada.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable

I'm not sure where the address 172.20.0.10 is coming from. I have the CNI and CoreDNS plugins installed, so I networking should be fine in my cluster. I'll investigate this further.

NAME                                       STATUS   ROLES    AGE    VERSION               INTERNAL-IP   EXTERNAL-IP     OS-IMAGE         KERNEL-VERSION                  CONTAINER-RUNTIME
ip-10-0-3-210.eu-west-1.compute.internal   Ready    <none>   101m   v1.29.3-eks-ae9a62a   10.0.3.210    xxxxx   Amazon Linux 2   5.10.215-203.850.amzn2.x86_64   containerd://1.7.11
ip-10-0-4-204.eu-west-1.compute.internal   Ready    <none>   101m   v1.29.3-eks-ae9a62a   10.0.4.204    xxxxx    Amazon Linux 2   5.10.215-203.850.amzn2.x86_64   containerd://1.7.11
ip-10-0-4-221.eu-west-1.compute.internal   Ready    <none>   101m   v1.29.3-eks-ae9a62a   10.0.4.221    xxxxx   Amazon Linux 2   5.10.215-203.850.amzn2.x86_64   containerd://1.7.11

Logs for etcd:

$ kubectl logs etcd-0  -n karmada-system | grep -E 'warn|error'
{"level":"warn","ts":"2024-05-22T11:24:17.786349Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_CLIENT_PORT_2379_TCP=tcp://172.20.97.76:2379"}
{"level":"warn","ts":"2024-05-22T11:24:17.786904Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_CLIENT_SERVICE_PORT=2379"}
{"level":"warn","ts":"2024-05-22T11:24:17.786923Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_CLIENT_SERVICE_HOST=172.20.97.76"}
{"level":"warn","ts":"2024-05-22T11:24:17.786933Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_CLIENT_SERVICE_PORT_ETCD_CLIENT_PORT=2379"}
{"level":"warn","ts":"2024-05-22T11:24:17.786945Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_CLIENT_PORT=tcp://172.20.97.76:2379"}
{"level":"warn","ts":"2024-05-22T11:24:17.786954Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_CLIENT_PORT_2379_TCP_ADDR=172.20.97.76"}
{"level":"warn","ts":"2024-05-22T11:24:17.786964Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_CLIENT_PORT_2379_TCP_PROTO=tcp"}
{"level":"warn","ts":"2024-05-22T11:24:17.787049Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_CLIENT_PORT_2379_TCP_PORT=2379"}
{"level":"warn","ts":"2024-05-22T11:24:17.78712Z","caller":"embed/config.go:679","msg":"Running http and grpc server on single port. This is not recommended for production."}
{"level":"warn","ts":"2024-05-22T11:24:17.787406Z","caller":"embed/config.go:679","msg":"Running http and grpc server on single port. This is not recommended for production."}
{"level":"warn","ts":"2024-05-22T11:24:17.78821Z","caller":"fileutil/fileutil.go:53","msg":"check file permission","error":"directory \"/var/lib/etcd\" exist, but the permission is \"drwxr-xr-x\". The recommended permission is \"-rwx------\" to prevent possible unprivileged access to the data"}
{"level":"warn","ts":"2024-05-22T11:24:17.811906Z","caller":"auth/store.go:1241","msg":"simple token is not cryptographically signed"}

can you try install karmada at karmada-system namespace? (do not use other namespace)

I always let Karmada create and use karmada-system.

levkp avatar May 22 '24 13:05 levkp

Thanks, I will continue to look into the Error monitoring cluster health: no matches for kind "Cluster" in version "cluster.karmada.io/v1alpha1" error message, I will reply to you as soon as I find something new.

chaosi-zju avatar May 23 '24 01:05 chaosi-zju

@chaosi-zju Because crds of karmada be installed after running karmada-controller-manager in our chart, so the karmada-controller-manager not informer those resource. but now pod should return to running after a few restarts.

This issue is not easy to solve, I have a solution here that runs karmada-controller-manager behind post-install-job. how do you think?

calvin0327 avatar May 23 '24 07:05 calvin0327

@chaosi-zju In the end, I successfully joined 3 clusters using the kubectl plugin: kubectl karmada init. So I ended up not using Helm. Unfortunately, I don't have time to further investigate the issue Error monitoring cluster health: no matches for kind "Cluster" in version "cluster.karmada.io/v1alpha1".

levkp avatar May 29 '24 07:05 levkp

Thanks @levkp for spotting. This issue is tracked by #4917 either. We can close this now. /close

RainbowMango avatar May 29 '24 09:05 RainbowMango

@RainbowMango: Closing this issue.

In response to this:

Thanks @levkp for spotting. This issue is tracked by #4917 either. We can close this now. /close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

karmada-bot avatar May 29 '24 09:05 karmada-bot