karmada icon indicating copy to clipboard operation
karmada copied to clipboard

custom.metrics.k8s.io not found in cluster

Open chaosi-zju opened this issue 10 months ago • 5 comments

What happened:

when I installed Karmada by local-up-karmada.sh, there are so many warning log in karmada-metrics-adapter, just like:

W0402 09:33:19.871185       1 custommetrics.go:306] custom.metrics.k8s.io not found in cluster(member3)
W0402 09:33:19.871786       1 custommetrics.go:306] custom.metrics.k8s.io not found in cluster(member2)
W0402 09:33:19.872300       1 custommetrics.go:306] custom.metrics.k8s.io not found in cluster(member3)
W0402 09:33:19.872441       1 custommetrics.go:306] custom.metrics.k8s.io not found in cluster(member3)
W0402 09:33:19.874165       1 custommetrics.go:306] custom.metrics.k8s.io not found in cluster(member3)
W0402 09:33:19.876792       1 custommetrics.go:306] custom.metrics.k8s.io not found in cluster(member2)
W0402 09:33:19.876940       1 custommetrics.go:306] custom.metrics.k8s.io not found in cluster(member2)
W0402 09:33:19.877613       1 custommetrics.go:306] custom.metrics.k8s.io not found in cluster(member2)
W0402 09:33:19.877877       1 custommetrics.go:306] custom.metrics.k8s.io not found in cluster(member2)
W0402 09:33:19.879804       1 custommetrics.go:306] custom.metrics.k8s.io not found in cluster(member1)
W0402 09:33:19.880877       1 custommetrics.go:306] custom.metrics.k8s.io not found in cluster(member1)
W0402 09:33:19.881422       1 custommetrics.go:306] custom.metrics.k8s.io not found in cluster(member1)
W0402 09:33:19.882611       1 custommetrics.go:306] custom.metrics.k8s.io not found in cluster(member1)
W0402 09:33:19.882814       1 custommetrics.go:306] custom.metrics.k8s.io not found in cluster(member1)
W0402 09:33:19.883410       1 custommetrics.go:306] custom.metrics.k8s.io not found in cluster(member1)
...

what does the log mean, and whether there is a problem?

What you expected to happen:

no so many warning log.

How to reproduce it (as minimally and precisely as possible):

just install Karmada by local-up-karmada.sh.

Anything else we need to know?:

Environment:

  • Karmada version: latest
  • kubectl-karmada or karmadactl version (the result of kubectl-karmada version or karmadactl version):
  • Others:

chaosi-zju avatar Apr 02 '24 09:04 chaosi-zju

Problem locating is in progress, here are some clue:


I stoped all deployments except karmada-apiserverkarmada-aggregated-apiserver and karmada-metrics-adapter:

$ kubectl get deploy -n karmada-system
NAME                                  READY   UP-TO-DATE   AVAILABLE   AGE
karmada-aggregated-apiserver          1/1     1            1           29h
karmada-apiserver                     1/1     1            1           29h
karmada-controller-manager            0/0     0            0           29h
karmada-descheduler                   0/0     0            0           29h
karmada-kube-controller-manager       0/0     0            0           29h
karmada-metrics-adapter               1/1     1            1           29h
karmada-scheduler                     0/0     0            0           29h
karmada-scheduler-estimator-member1   0/0     0            0           29h
karmada-scheduler-estimator-member2   0/0     0            0           29h
karmada-scheduler-estimator-member3   0/0     0            0           29h
karmada-search                        0/0     0            0           29h
karmada-webhook                       0/0     0            0           29h

$ kubectl get pods -n karmada-system                                                            
NAME                                            READY   STATUS    RESTARTS   AGE
etcd-0                                          1/1     Running   0          29h
karmada-aggregated-apiserver-7d97d4dc66-7lkh9   1/1     Running   0          51s
karmada-apiserver-6ddf46ff5c-q7lhf              1/1     Running   0          29h
karmada-metrics-adapter-64bd5588ff-n6r2z        1/1     Running   0          3m47s

tip: karmada-metrics-adapter depends on Cluster object, so karmada-aggregated-apiserver is needed.

Here are still many such logs, 30 lines logs every 30 seconds:

kubectl logs -f karmada-metrics-adapter-64bd5588ff-n6r2z -n karmada-system
W0528 09:28:05.804862       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member1)
W0528 09:28:05.808817       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member2)
W0528 09:28:05.810095       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member1)
W0528 09:28:05.816988       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member3)
W0528 09:28:05.812497       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member1)
W0528 09:28:05.813263       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member3)
W0528 09:28:05.818458       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member2)
W0528 09:28:05.820192       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member3)
W0528 09:28:05.820830       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member2)
W0528 09:28:05.822064       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member1)
W0528 09:28:05.822224       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member2)
W0528 09:28:05.822936       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member1)
W0528 09:28:05.826700       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member1)
W0528 09:28:05.829861       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member2)
W0528 09:28:05.830414       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member2)
W0528 09:28:05.830845       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member1)
W0528 09:28:05.831248       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member1)
W0528 09:28:05.831760       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member1)
W0528 09:28:05.833759       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member3)
W0528 09:28:05.837484       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member1)
W0528 09:28:05.839309       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member2)
W0528 09:28:05.839663       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member3)
W0528 09:28:05.840396       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member2)
W0528 09:28:05.841718       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member2)
W0528 09:28:05.843612       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member3)
W0528 09:28:05.843884       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member3)
W0528 09:28:05.844970       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member3)
W0528 09:28:05.845053       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member2)
W0528 09:28:05.846675       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member3)
W0528 09:28:05.847210       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member3)

W0528 09:28:35.802967       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member2)
W0528 09:28:35.804402       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member2)
W0528 09:28:35.810288       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member2)
W0528 09:28:35.812237       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member3)
W0528 09:28:35.816186       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member2)
W0528 09:28:35.820015       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member2)
W0528 09:28:35.820633       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member2)
W0528 09:28:35.820859       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member2)
W0528 09:28:35.822724       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member2)
W0528 09:28:35.823100       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member3)
W0528 09:28:35.824205       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member1)
W0528 09:28:35.826565       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member1)
W0528 09:28:35.827478       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member1)
W0528 09:28:35.827829       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member3)
W0528 09:28:35.828822       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member3)
W0528 09:28:35.830739       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member1)
W0528 09:28:35.831238       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member1)
W0528 09:28:35.831790       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member1)
W0528 09:28:35.833766       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member2)
W0528 09:28:35.834467       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member3)
W0528 09:28:35.835383       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member3)
W0528 09:28:35.835551       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member1)
W0528 09:28:35.837712       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member3)
W0528 09:28:35.838446       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member3)
W0528 09:28:35.839101       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member3)
W0528 09:28:35.842752       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member1)
W0528 09:28:35.843873       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member3)
W0528 09:28:35.845648       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member1)
W0528 09:28:35.847677       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member1)
W0528 09:28:35.851251       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member2)

W0528 09:29:05.814110       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member3)
W0528 09:29:05.817770       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member3)
W0528 09:29:05.819412       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member2)
W0528 09:29:05.820766       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member2)
W0528 09:29:05.821774       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member3)
W0528 09:29:05.822578       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member2)
W0528 09:29:05.823109       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member3)
W0528 09:29:05.823747       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member3)
W0528 09:29:05.823973       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member2)
W0528 09:29:05.824345       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member3)
W0528 09:29:05.824407       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member2)
W0528 09:29:05.825722       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member2)
W0528 09:29:05.826174       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member1)
W0528 09:29:05.826872       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member2)
W0528 09:29:05.827654       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member1)
W0528 09:29:05.828103       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member2)
W0528 09:29:05.828747       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member1)
W0528 09:29:05.829745       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member1)
W0528 09:29:05.830147       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member1)
W0528 09:29:05.830635       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member1)
W0528 09:29:05.833722       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member1)
W0528 09:29:05.835562       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member2)
W0528 09:29:05.838407       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member2)
W0528 09:29:05.838717       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member3)
W0528 09:29:05.840159       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member3)
W0528 09:29:05.841614       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member1)
W0528 09:29:05.841762       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member3)
W0528 09:29:05.843498       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member1)
W0528 09:29:05.845239       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member1)
W0528 09:29:05.846175       1 custommetrics.go:307] custom.metrics.k8s.io not found in cluster(member3)

So, the request is send from karmada-apiserver to karmada-metrics-adapter.

The logs of karmada-apiserver in the corresponding time:

I0528 09:28:05.788710       1 available_controller.go:478] Updating v1beta2.custom.metrics.k8s.io
I0528 09:28:05.788847       1 available_controller.go:478] Updating v1beta1.custom.metrics.k8s.io
I0528 09:28:05.788900       1 available_controller.go:478] Updating v1beta1.metrics.k8s.io

I0528 09:28:35.787958       1 available_controller.go:478] Updating v1beta1.custom.metrics.k8s.io
I0528 09:28:35.788005       1 available_controller.go:478] Updating v1beta1.metrics.k8s.io
I0528 09:28:35.788119       1 available_controller.go:478] Updating v1beta2.custom.metrics.k8s.io

I0528 09:29:05.789519       1 available_controller.go:478] Updating v1beta2.custom.metrics.k8s.io
I0528 09:29:05.791161       1 available_controller.go:478] Updating v1beta1.custom.metrics.k8s.io
I0528 09:29:05.791543       1 available_controller.go:478] Updating v1beta1.metrics.k8s.io

chaosi-zju avatar May 28 '24 09:05 chaosi-zju

More clue added:

I add a line log in karmada-metrics-adapter to print out every request it receives in the format of:

klog.Infof("[DEBUG] %+v -> %s%s", req.Request.RemoteAddr, req.Request.Host, req.Request.URL)

every 30 seconds, I got such results:

I0606 07:09:41.601556       1 version.go:79] [DEBUG] 10.244.0.1:41073 -> karmada-metrics-adapter.karmada-system.svc.cluster.local:443/apis/custom.metrics.k8s.io/v1beta1
I0606 07:09:41.603029       1 version.go:79] [DEBUG] 10.244.0.1:41073 -> karmada-metrics-adapter.karmada-system.svc.cluster.local:443/apis/metrics.k8s.io/v1beta1
I0606 07:09:41.603565       1 version.go:79] [DEBUG] 10.244.0.1:41073 -> karmada-metrics-adapter.karmada-system.svc.cluster.local:443/apis/custom.metrics.k8s.io/v1beta2
I0606 07:09:41.612365       1 version.go:79] [DEBUG] 10.244.0.1:41073 -> karmada-metrics-adapter.karmada-system.svc.cluster.local:443/apis/metrics.k8s.io/v1beta1
I0606 07:09:41.605862       1 version.go:79] [DEBUG] 10.244.0.1:41073 -> karmada-metrics-adapter.karmada-system.svc.cluster.local:443/apis/custom.metrics.k8s.io/v1beta1
I0606 07:09:41.607022       1 version.go:79] [DEBUG] 10.244.0.1:41073 -> karmada-metrics-adapter.karmada-system.svc.cluster.local:443/apis/custom.metrics.k8s.io/v1beta2
I0606 07:09:41.608133       1 version.go:79] [DEBUG] 10.244.0.1:41073 -> karmada-metrics-adapter.karmada-system.svc.cluster.local:443/apis/custom.metrics.k8s.io/v1beta1
I0606 07:09:41.619176       1 version.go:79] [DEBUG] 10.244.0.1:41073 -> karmada-metrics-adapter.karmada-system.svc.cluster.local:443/apis/metrics.k8s.io/v1beta1
I0606 07:09:41.620782       1 version.go:79] [DEBUG] 10.244.0.1:41073 -> karmada-metrics-adapter.karmada-system.svc.cluster.local:443/apis/custom.metrics.k8s.io/v1beta2
I0606 07:09:41.622144       1 version.go:79] [DEBUG] 10.244.0.1:41073 -> karmada-metrics-adapter.karmada-system.svc.cluster.local:443/apis/metrics.k8s.io/v1beta1
I0606 07:09:41.609279       1 version.go:79] [DEBUG] 10.244.0.1:41073 -> karmada-metrics-adapter.karmada-system.svc.cluster.local:443/apis/custom.metrics.k8s.io/v1beta1
I0606 07:09:41.609535       1 version.go:79] [DEBUG] 10.244.0.1:41073 -> karmada-metrics-adapter.karmada-system.svc.cluster.local:443/apis/metrics.k8s.io/v1beta1
I0606 07:09:41.610455       1 version.go:79] [DEBUG] 10.244.0.1:41073 -> karmada-metrics-adapter.karmada-system.svc.cluster.local:443/apis/custom.metrics.k8s.io/v1beta1
I0606 07:09:41.604606       1 version.go:79] [DEBUG] 10.244.0.1:41073 -> karmada-metrics-adapter.karmada-system.svc.cluster.local:443/apis/custom.metrics.k8s.io/v1beta2
I0606 07:09:41.613344       1 version.go:79] [DEBUG] 10.244.0.1:41073 -> karmada-metrics-adapter.karmada-system.svc.cluster.local:443/apis/custom.metrics.k8s.io/v1beta2

As you see, here are 3 kind of request: /apis/metrics.k8s.io/v1beta1/apis/custom.metrics.k8s.io/v1beta1 and /apis/custom.metrics.k8s.io/v1beta2, each request has 5 amount.

Then:

  • /apis/metrics.k8s.io/v1beta1 request is handled by staticLister.ListAPIResources (vendor/k8s.io/apiserver/pkg/endpoints/groupversion.go:148) and got reponse as:

    [{Name:nodes SingularName: Namespaced:false Group: Version: Kind:NodeMetrics Verbs:[get list] ShortNames:[] Categories:[] StorageVersionHash:} {Name:pods SingularName: Namespaced:true Group: Version: Kind:PodMetrics Verbs:[get list] ShortNames:[] Categories:[] StorageVersionHash:}]
    
  • /apis/custom.metrics.k8s.io/v1beta1 and /apis/custom.metrics.k8s.io/v1beta2 request are handled by customMetricsResourceLister.ListAPIResources (vendor/sigs.k8s.io/custom-metrics-apiserver/pkg/provider/resource_lister.go:42), since member cluster has no custom-metrics, they got empty response. However, in customMetricsResourceLister.ListAPIResources called CustomMetricsProvider.ListAllMetrics, this is the reason for why CustomMetricsProvider.ListAllMetrics has been called 10 times per 30 seconds.


So, the last question is why karmada-apiserver send /apis/custom.metrics.k8s.io/v1beta1 to karmada-metrics-adapter for 5 requests each time.

chaosi-zju avatar Jun 06 '24 07:06 chaosi-zju

So, the last question is why karmada-apiserver send /apis/custom.metrics.k8s.io/v1beta1 to karmada-metrics-adapter for 5 requests each time.

refer to: https://github.com/kubernetes/kube-aggregator/blob/b29cb30970faaa9cc09414f3510ca978d0180641/pkg/controllers/status/available_controller.go#L294-L375

when karmada-apiserver send /apis/custom.metrics.k8s.io/v1beta1 to karmada-metrics-adapter, it try 5 attempts. as you see:

attempts := 5
results := make(chan error, attempts)
for i := 0; i < attempts; i++ {
	go func() {
          ...
        }
  ...
}

If any one attempt succeed, it is regareded as success, as you see:

var lastError error
for i := 0; i < attempts; i++ {
	lastError = <-results
	// if we had at least one success, we are successful overall and we can return now
	if lastError == nil {
		break
	}
}

this is the existing mechanisms of kube-aggregator module in karmada-apiserver, and it is why each request is sent 5 times.

chaosi-zju avatar Jun 06 '24 14:06 chaosi-zju

@RainbowMango refer to above two clues, it can be summarized as follows:

  1. karmada-apiserver should know how many apiResources does karmada-metrics-adapter (essentially an aggregated-apiserver) manages, so it check and update this info every 30 seconds by send /apis/custom.metrics.k8s.io/v1beta1 and /apis/custom.metrics.k8s.io/v1beta2 to karmada-metrics-adapter since we defined following two apiservice:
$ kubectl get apiservice | grep custom.metrics       
v1beta1.custom.metrics.k8s.io          karmada-system/karmada-metrics-adapter        True        7d6h
v1beta2.custom.metrics.k8s.io          karmada-system/karmada-metrics-adapter        True        7d6h
  1. each request has 5 attempts, so karmada-metrics-adapter will be called 10 times every 30 seconds.

  2. That function ListAllMetrics is the handler of ListAPIResources API such as /apis/custom.metrics.k8s.io/v1beta1 in karmada-metrics-adapter, it will request 3 member clusters, so there are 30 lines warning logs every 30 seconds generated.

Above all, it is within expectations, while the log is not needed, so I suggest that we replace klog.Warningf("custom.metrics.k8s.io not found in cluster(%s)", clusterName) to klog.V(6).Infof("custom.metrics.k8s.io not found in cluster(%s)", clusterName).

chaosi-zju avatar Jun 06 '24 14:06 chaosi-zju

Seems the AvailableConditionController just wants to check all registered APIService available. But that will lead to karmada-metrics-adapter to list all metrics from member clusters, this is quite concerning.

I need some time to look into it.

RainbowMango avatar Jun 07 '24 03:06 RainbowMango