metrics-server
metrics-server copied to clipboard
Metrics server not running
Hi,
I have deployed the metrics server but it's not running
I have added the below lines after "args" in "components.yaml "
command:
- /metrics-server
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
then it worked. Below is the screenshot for reference.
Can you please add those line. or put the instruction in README.md file.
Maybe this is what you need
Kubelet certificate needs to be signed by cluster Certificate Authority (or disable certificate validation by passing --kubelet-insecure-tls to Metrics Server)
https://github.com/kubernetes-sigs/metrics-server/blob/master/README.md#requirements
I'm having the same problem, tried to pass --kubelet-insecure-tls
in the args but it didn't work. I have also tried using the suggested fix but it didn't work.
Log when describing the metrics-server pod (note that the --kubelet-insecure-tls
was passed):
Name: metrics-server-658867cdb7-9g2px
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: docker-desktop/192.168.65.4
Start Time: Thu, 25 Aug 2022 21:47:40 -0300
Labels: k8s-app=metrics-server
pod-template-hash=658867cdb7
Annotations: <none>
Status: Running
IP: 10.1.0.78
IPs:
IP: 10.1.0.78
Controlled By: ReplicaSet/metrics-server-658867cdb7
Containers:
metrics-server:
Container ID: docker://4c1d8989b9aa61a4d924f8cb6102084e5faadd7d93c661f186dfa3b99721f22e
Image: k8s.gcr.io/metrics-server/metrics-server:v0.6.1
Image ID: docker-pullable://k8s.gcr.io/metrics-server/metrics-server@sha256:5ddc6458eb95f5c70bd13fdab90cbd7d6ad1066e5b528ad1dcb28b76c5fb2f00
Port: 4443/TCP
Host Port: 0/TCP
Args:
--cert-dir=/tmp
--secure-port=4443
--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
--kubelet-use-node-status-port
--metric-resolution=15s
--kubelet-insecure-tls
State: Running
Started: Thu, 25 Aug 2022 21:47:41 -0300
Ready: False
Restart Count: 0
Requests:
cpu: 100m
memory: 200Mi
Liveness: http-get https://:https/livez delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get https://:https/readyz delay=20s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/tmp from tmp-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-v5mcn (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
tmp-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-v5mcn:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 30s default-scheduler Successfully assigned kube-system/metrics-server-658867cdb7-9g2px to docker-desktop
Normal Pulled 29s kubelet Container image "k8s.gcr.io/metrics-server/metrics-server:v0.6.1" already present on machine
Normal Created 29s kubelet Created container metrics-server
Normal Started 29s kubelet Started container metrics-server
Warning Unhealthy 0s kubelet Readiness probe failed: HTTP probe failed with statuscode: 500
Can you provide the logs of metrics-server?
kubectl logs -n kube-system pods/metrics-server-658867cdb7-9g2px
I0826 00:47:41.539717 1 serving.go:342] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
I0826 00:47:41.898143 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0826 00:47:41.898160 1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I0826 00:47:41.898160 1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I0826 00:47:41.898168 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0826 00:47:41.898173 1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I0826 00:47:41.898181 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0826 00:47:41.898381 1 dynamic_serving_content.go:131] "Starting controller" name="serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key"
I0826 00:47:41.898427 1 secure_serving.go:266] Serving securely on [::]:4443
I0826 00:47:41.898483 1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
W0826 00:47:41.898544 1 shared_informer.go:372] The sharedIndexInformer has started, run more than once is not allowed
I0826 00:47:41.998317 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0826 00:47:41.998329 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0826 00:47:41.998364 1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController
E0826 00:47:55.396469 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.65.4:10250/metrics/resource\": context deadline exceeded" node="docker-desktop"
E0826 00:48:10.396404 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.65.4:10250/metrics/resource\": context deadline exceeded" node="docker-desktop"
I0826 00:48:10.428713 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0826 00:48:20.429191 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0826 00:48:25.396717 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.65.4:10250/metrics/resource\": context deadline exceeded" node="docker-desktop"
I0826 00:48:30.430000 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0826 00:48:40.396273 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.65.4:10250/metrics/resource\": context deadline exceeded" node="docker-desktop"
I0826 00:48:40.429237 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0826 00:48:50.428677 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0826 00:48:55.396038 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.65.4:10250/metrics/resource\": context deadline exceeded" node="docker-desktop"
I0826 00:49:00.429074 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0826 00:49:02.239233 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0826 00:49:10.397128 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.65.4:10250/metrics/resource\": context deadline exceeded" node="docker-desktop"
I0826 00:49:10.429253 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0826 00:49:20.429121 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0826 00:49:25.396276 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.65.4:10250/metrics/resource\": context deadline exceeded" node="docker-desktop"
I0826 00:49:30.428659 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0826 00:49:40.396563 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.65.4:10250/metrics/resource\": context deadline exceeded" node="docker-desktop"
I0826 00:49:40.429080 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0826 00:49:50.428520 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0826 00:49:55.396179 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.65.4:10250/metrics/resource\": context deadline exceeded" node="docker-desktop"
I0826 00:50:00.429383 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0826 00:50:10.395473 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.65.4:10250/metrics/resource\": context deadline exceeded" node="docker-desktop"
I0826 00:50:10.429282 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0826 00:50:20.429489 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0826 00:50:25.396494 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.65.4:10250/metrics/resource\": context deadline exceeded" node="docker-desktop"
I0826 00:50:30.428938 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0826 00:50:32.239825 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0826 00:50:40.396391 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.65.4:10250/metrics/resource\": context deadline exceeded" node="docker-desktop"
I0826 00:50:40.428628 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0826 00:50:50.430082 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0826 00:50:55.396338 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.65.4:10250/metrics/resource\": context deadline exceeded" node="docker-desktop"
I0826 00:51:00.427547 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0826 00:51:10.395791 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.65.4:10250/metrics/resource\": context deadline exceeded" node="docker-desktop"
I0826 00:51:10.428750 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0826 00:51:20.428033 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0826 00:51:25.395724 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.65.4:10250/metrics/resource\": context deadline exceeded" node="docker-desktop"
I0826 00:51:30.428932 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0826 00:51:38.240021 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0826 00:51:40.396141 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.65.4:10250/metrics/resource\": context deadline exceeded" node="docker-desktop"
I0826 00:51:40.429124 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0826 00:51:50.428346 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0826 00:51:55.396187 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.65.4:10250/metrics/resource\": context deadline exceeded" node="docker-desktop"
I0826 00:52:00.429772 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0826 00:52:10.396327 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.65.4:10250/metrics/resource\": context deadline exceeded" node="docker-desktop"
I0826 00:52:10.428712 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0826 00:52:20.428466 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0826 00:52:25.397218 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.65.4:10250/metrics/resource\": context deadline exceeded" node="docker-desktop"
I0826 00:52:30.429568 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0826 00:52:40.396537 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.65.4:10250/metrics/resource\": context deadline exceeded" node="docker-desktop"
I0826 00:52:40.429133 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0826 00:52:50.429106 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0826 00:52:55.396498 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.65.4:10250/metrics/resource\": context deadline exceeded" node="docker-desktop"
I0826 00:53:00.239568 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0826 00:53:00.429061 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
Maybe the problem is that the IP 192.168.65.4
is being used, but on windows the internal ip is not used. Instead we use localhost
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION
CONTAINER-RUNTIME
docker-desktop Ready control-plane 23h v1.24.0 192.168.65.4 <none> Docker Desktop 5.10.102.1-microsoft-standard-WSL2 docker://20.10.14
"Failed to scrape node" err="Get \"https://192.168.65.4:10250/metrics/resource\": context deadline exceeded" node="docker-desktop"
Accessing the kubelet's /metrics/resource endpoint timed out.
There is a similar issue here, I don't know if it can help you https://github.com/kubernetes-sigs/metrics-server/issues/907
Increasing the metric-resolution
made the pod ready, but after it executing kubectl get --raw /api/v1/nodes/docker-desktop/proxy/metrics/resource
returned: Error from server (NotFound): the server could not find the requested resource
. The problem is before changing the value I was able to hit the endpoint (response time of ~20s as described in the issue).
The other problem is hpa is complaining.
kubectl get all -n kube-system
NAME READY STATUS RESTARTS AGE
pod/coredns-6d4b75cb6d-8qzvf 1/1 Running 1 (7h10m ago) 24h
pod/coredns-6d4b75cb6d-km2lg 1/1 Running 1 (7h10m ago) 24h
pod/etcd-docker-desktop 1/1 Running 1 (7h10m ago) 24h
pod/kube-apiserver-docker-desktop 1/1 Running 1 (7h10m ago) 24h
pod/kube-controller-manager-docker-desktop 1/1 Running 1 (7h10m ago) 24h
pod/kube-proxy-mw4cv 1/1 Running 1 (7h10m ago) 24h
pod/kube-scheduler-docker-desktop 1/1 Running 1 (7h10m ago) 24h
pod/metrics-server-769cd769-j4xck 1/1 Running 0 15m
pod/storage-provisioner 1/1 Running 1 (7h10m ago) 24h
pod/vpnkit-controller 1/1 Running 45 (9m7s ago) 24h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 24h
service/metrics-server ClusterIP 10.110.212.220 <none> 443/TCP 15m
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/kube-proxy 1 1 1 1 1 kubernetes.io/os=linux 24h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/coredns 2/2 2 2 24h
deployment.apps/metrics-server 1/1 1 1 15m
NAME DESIRED CURRENT READY AGE
replicaset.apps/coredns-6d4b75cb6d 2 2 2 24h
replicaset.apps/metrics-server-769cd769 1 1 1 15m
kubectl describe pod -n kube-system metrics-server-769cd769-j4xck
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 12m default-scheduler Successfully assigned kube-system/metrics-server-769cd769-j4xck to docker-desktop
Normal Pulled 12m kubelet Container image "k8s.gcr.io/metrics-server/metrics-server:v0.6.1" already present on machine
Normal Created 12m kubelet Created container metrics-server
Normal Started 12m kubelet Started container metrics-server
Warning Unhealthy 11m (x6 over 12m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 500
kubectl logs -n kube-system pods/metrics-server-769cd769-j4xck
I0826 02:25:55.294316 1 serving.go:342] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
I0826 02:25:55.636095 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0826 02:25:55.636127 1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I0826 02:25:55.636138 1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I0826 02:25:55.636145 1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I0826 02:25:55.636154 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0826 02:25:55.636146 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0826 02:25:55.636233 1 dynamic_serving_content.go:131] "Starting controller" name="serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key"
I0826 02:25:55.636274 1 secure_serving.go:266] Serving securely on [::]:4443
W0826 02:25:55.636337 1 shared_informer.go:372] The sharedIndexInformer has started, run more than once is not allowed
I0826 02:25:55.636337 1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I0826 02:25:55.736986 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0826 02:25:55.737003 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0826 02:25:55.737291 1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController
I0826 02:26:24.131175 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0826 02:26:34.132103 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0826 02:26:44.131387 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0826 02:26:54.131096 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0826 02:27:00.239368 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0826 02:27:04.131346 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
kubectl describe hpa hpa-portal
Name: hpa-portal
Namespace: default
Labels: <none>
Annotations: <none>
CreationTimestamp: Thu, 25 Aug 2022 23:29:18 -0300
Reference: Deployment/deployment-portal
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): <unknown> / 50%
Min replicas: 1
Max replicas: 10
Deployment pods: 3 current / 0 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededGetScale the HPA controller was able to get the target's current scale
ScalingActive False FailedGetResourceMetric the HPA was unable to compute the replica count: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedGetResourceMetric 54s (x11 over 10m) horizontal-pod-autoscaler failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
Warning FailedComputeMetricsReplicas 54s (x11 over 10m) horizontal-pod-autoscaler invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
If executing kubectl get --raw /api/v1/nodes/docker-desktop/proxy/metrics/resource
returned: Error from server (NotFound): the server could not find the requested resource
, Please check whether the kubelet is running normally, whether there are repeated restarts, etc.
/kind support
Hi, @Gab-Menezes, Has the issue been solved?
Sorry, I forgot about this issue, but no. I can't figure out how to check kubelet status on windows. But I would say that is running normally since I don't have any problem in other pods/applications.
Hey @yangjunmyfm192085,
I have the same issue.
Getting failed to get cpu utilization: missing request for cpu
for hpa.
sk@mac ~ % kubectl version --output=yaml
clientVersion:
buildDate: "2022-01-19T17:51:12Z"
compiler: gc
gitCommit: b631974d68ac5045e076c86a5c66fba6f128dc72
gitTreeState: clean
gitVersion: v1.21.9
goVersion: go1.16.12
major: "1"
minor: "21"
platform: darwin/arm64
serverVersion:
buildDate: "2022-07-06T18:06:50Z"
compiler: gc
gitCommit: ac73613dfd25370c18cbbbc6bfc65449397b35c7
gitTreeState: clean
gitVersion: v1.21.14-eks-18ef993
goVersion: go1.16.15
major: "1"
minor: 21+
platform: linux/amd64
hpa debug
sk@mac ~ % kubectl describe hpa test-hpa -n bart
Name: test-hpa
Namespace: bart
Labels: app.kubernetes.io/managed-by=Helm
Annotations: meta.helm.sh/release-name: test-bart
meta.helm.sh/release-namespace: bart
CreationTimestamp: Wed, 07 Sep 2022 18:17:15 +0300
Reference: Deployment/test-88
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): <unknown> / 60%
Min replicas: 2
Max replicas: 20
Deployment pods: 2 current / 0 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededGetScale the HPA controller was able to get the target's current scale
ScalingActive False FailedGetResourceMetric the HPA was unable to compute the replica count: failed to get cpu utilization: missing request for cpu
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedGetResourceMetric 2m45s (x4359 over 18h) horizontal-pod-autoscaler failed to get cpu utilization: missing request for cpu
metrics-server logs
I0907 15:32:51.346245 1 serving.go:342] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
I0907 15:32:51.742115 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0907 15:32:51.742137 1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I0907 15:32:51.742142 1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I0907 15:32:51.742151 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0907 15:32:51.742172 1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I0907 15:32:51.742179 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0907 15:32:51.742876 1 dynamic_serving_content.go:131] "Starting controller" name="serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key"
I0907 15:32:51.743033 1 secure_serving.go:266] Serving securely on [::]:4443
I0907 15:32:51.743072 1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
W0907 15:32:51.743156 1 shared_informer.go:372] The sharedIndexInformer has started, run more than once is not allowed
I0907 15:32:51.842769 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0907 15:32:51.842791 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0907 15:32:51.842810 1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController
I0907 15:33:19.126706 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0907 15:33:29.125385 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0907 15:33:39.126147 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0907 15:33:49.125540 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0907 15:33:59.125736 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0907 15:34:09.125897 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0907 15:34:19.127252 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0907 15:34:29.125685 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
Any ideas guys how to debug or fix it?
Hi @serg4kostiuk Thanks for the feedback. You can refer to the known issues below to confirm whether the metrics obtained from the kubelet are incomplete? https://github.com/kubernetes-sigs/metrics-server/blob/master/KNOWN_ISSUES.md#kubelet-doesnt-report-metrics-for-all-or-subset-of-nodes https://github.com/kubernetes-sigs/metrics-server/blob/master/KNOWN_ISSUES.md#kubelet-doesnt-report-pod-metrics
thanks, @yangjunmyfm192085, already did it. the output:
sk@mac ~ % kubectl top nodes
W0908 14:58:55.031967 33507 top_node.go:119] Using json format to get metrics. Next release will switch to protocol-buffers, switch early by passing --use-protocol-buffers flag
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
ip-10-100-100-174.ec2.internal 196m 2% 12382Mi 85%
ip-10-100-100-8.ec2.internal 218m 5% 3413Mi 51%
ip-10-100-101-169.ec2.internal 103m 1% 3165Mi 21%
ip-10-100-101-229.ec2.internal 121m 1% 7005Mi 48%
metrics
sk@mac ~ % NODE_NAME=ip-10-100-100-174.ec2.internal
kubectl get --raw /api/v1/nodes/$NODE_NAME/proxy/metrics/resource
# HELP container_cpu_usage_seconds_total [ALPHA] Cumulative cpu time consumed by the container in core-seconds
# TYPE container_cpu_usage_seconds_total counter
container_cpu_usage_seconds_total{container="agent",namespace="newrelic",pod="newrelic-nrk8s-kubelet-bvjsd"} 418.524933302 1662638501027
container_cpu_usage_seconds_total{container="aws-node",namespace="kube-system",pod="aws-node-npt8s"} 312.666112546 1662638509159
container_cpu_usage_seconds_total{container="aws-node-termination-handler",namespace="kube-system",pod="aws-node-termination-handler-vskng"} 41.500486192 1662638510942
container_cpu_usage_seconds_total{container="kube-proxy",namespace="kube-system",pod="kube-proxy-5bb56"} 35.132398838 1662638498393
container_cpu_usage_seconds_total{container="kubelet",namespace="newrelic",pod="newrelic-nrk8s-kubelet-bvjsd"} 809.146088806 1662638508346
container_cpu_usage_seconds_total{container="newrelic-logging",namespace="newrelic",pod="newrelic-newrelic-logging-8ztfv"} 666.912120419 1662638506883
container_cpu_usage_seconds_total{container="test",namespace="voidnew",pod="test-5799b7b769-hncfb"} 951.648087092 1662638501570
container_cpu_usage_seconds_total{container="test-admin",namespace="bart",pod="test-admin-5b47b858f4-88cmt"} 15.241756556 1662638499313
container_cpu_usage_seconds_total{container="test-admin",namespace="voidnew",pod="test-admin-659cf556cc-qclcz"} 594.401490141 1662638502666
container_cpu_usage_seconds_total{container="test-nginx",namespace="bart",pod="test-admin-5b47b858f4-88cmt"} 0.073064288 1662638507147
container_cpu_usage_seconds_total{container="test-nginx",namespace="voidnew",pod="test-5799b7b769-hncfb"} 20.366364436 1662638503771
container_cpu_usage_seconds_total{container="test-nginx",namespace="voidnew",pod="test-admin-659cf556cc-qclcz"} 4.621754426 1662638499878
container_cpu_usage_seconds_total{container="test-shoryuken",namespace="voidnew",pod="test-shoryuken-6bb9c696c6-lqdl4"} 45.107665199 1662638499624
container_cpu_usage_seconds_total{container="test-sidekiq-api-stats",namespace="bart",pod="test-sidekiq-api-stats-6c5b94756c-fchqw"} 104.431471208 1662638508859
container_cpu_usage_seconds_total{container="test-sidekiq-api-stats",namespace="voidnew",pod="test-sidekiq-api-stats-676d9bcf67-m2429"} 88.972119973 1662638505879
container_cpu_usage_seconds_total{container="test-sidekiq-default",namespace="bart",pod="test-sidekiq-default-6b5b648b85-nfv6v"} 400.918358602 1662638510012
container_cpu_usage_seconds_total{container="test-sidekiq-default",namespace="voidnew",pod="test-sidekiq-default-7f6556f77f-mr2dx"} 686.062540092 1662638503859
container_cpu_usage_seconds_total{container="test-sidekiq-export",namespace="voidnew",pod="test-sidekiq-export-557dd4cbf6-l955f"} 243.546512455 1662638511639
container_cpu_usage_seconds_total{container="test-sidekiq-mailer",namespace="bart",pod="test-sidekiq-mailer-56f59cdcc6-zhhqt"} 515.512617825 1662638497654
container_cpu_usage_seconds_total{container="test-sidekiq-mailer",namespace="voidnew",pod="test-sidekiq-mailer-8576d8fdd9-94bbt"} 449.133985296 1662638501244
container_cpu_usage_seconds_total{container="test-sidekiq-page-export",namespace="bart",pod="test-sidekiq-page-export-544fcb6bc-c4hpk"} 214.001744333 1662638509405
container_cpu_usage_seconds_total{container="test-sidekiq-page-export",namespace="voidnew",pod="test-sidekiq-page-export-7b9cdd66c9-d4dlg"} 226.575217255 1662638497931
container_cpu_usage_seconds_total{container="test-sidekiq-reporting",namespace="bart",pod="test-sidekiq-reporting-577864846-jfnwx"} 293.053909902 1662638506392
container_cpu_usage_seconds_total{container="test-sidekiq-reporting",namespace="voidnew",pod="test-sidekiq-reporting-76c8bd94fb-nh7ct"} 333.585208812 1662638511944
container_cpu_usage_seconds_total{container="test-sidekiq-support",namespace="bart",pod="test-sidekiq-support-6bdb69567d-gxnwv"} 152.612587484 1662638507206
container_cpu_usage_seconds_total{container="test-sidekiq-support",namespace="voidnew",pod="test-sidekiq-support-57dbb94478-t8x69"} 133.418589795 1662638511717
container_cpu_usage_seconds_total{container="test-sidekiq-webhook",namespace="bart",pod="test-sidekiq-webhook-c4b779697-khmcg"} 189.322319754 1662638503264
container_cpu_usage_seconds_total{container="test-sidekiq-webhook",namespace="voidnew",pod="test-sidekiq-webhook-54687bf855-45jj8"} 20.553327681 1662638507288
# HELP container_memory_working_set_bytes [ALPHA] Current working set of the container in bytes
# TYPE container_memory_working_set_bytes gauge
container_memory_working_set_bytes{container="agent",namespace="newrelic",pod="newrelic-nrk8s-kubelet-bvjsd"} 3.0031872e+07 1662638501027
container_memory_working_set_bytes{container="aws-node",namespace="kube-system",pod="aws-node-npt8s"} 5.1376128e+07 1662638509159
container_memory_working_set_bytes{container="aws-node-termination-handler",namespace="kube-system",pod="aws-node-termination-handler-vskng"} 1.3918208e+07 1662638510942
container_memory_working_set_bytes{container="kube-proxy",namespace="kube-system",pod="kube-proxy-5bb56"} 2.3269376e+07 1662638498393
container_memory_working_set_bytes{container="kubelet",namespace="newrelic",pod="newrelic-nrk8s-kubelet-bvjsd"} 2.6693632e+07 1662638508346
container_memory_working_set_bytes{container="newrelic-logging",namespace="newrelic",pod="newrelic-newrelic-logging-8ztfv"} 3.442688e+07 1662638506883
container_memory_working_set_bytes{container="test",namespace="voidnew",pod="test-5799b7b769-hncfb"} 2.48608768e+09 1662638501570
container_memory_working_set_bytes{container="test-admin",namespace="bart",pod="test-admin-5b47b858f4-88cmt"} 1.324457984e+09 1662638499313
container_memory_working_set_bytes{container="test-admin",namespace="voidnew",pod="test-admin-659cf556cc-qclcz"} 2.555981824e+09 1662638502666
container_memory_working_set_bytes{container="test-nginx",namespace="bart",pod="test-admin-5b47b858f4-88cmt"} 3.473408e+06 1662638507147
container_memory_working_set_bytes{container="test-nginx",namespace="voidnew",pod="test-5799b7b769-hncfb"} 3.977216e+06 1662638503771
container_memory_working_set_bytes{container="test-nginx",namespace="voidnew",pod="test-admin-659cf556cc-qclcz"} 4.56704e+06 1662638499878
container_memory_working_set_bytes{container="test-shoryuken",namespace="voidnew",pod="test-shoryuken-6bb9c696c6-lqdl4"} 3.3134592e+08 1662638499624
container_memory_working_set_bytes{container="test-sidekiq-api-stats",namespace="bart",pod="test-sidekiq-api-stats-6c5b94756c-fchqw"} 3.41594112e+08 1662638508859
container_memory_working_set_bytes{container="test-sidekiq-api-stats",namespace="voidnew",pod="test-sidekiq-api-stats-676d9bcf67-m2429"} 3.31546624e+08 1662638505879
container_memory_working_set_bytes{container="test-sidekiq-default",namespace="bart",pod="test-sidekiq-default-6b5b648b85-nfv6v"} 3.4945024e+08 1662638510012
container_memory_working_set_bytes{container="test-sidekiq-default",namespace="voidnew",pod="test-sidekiq-default-7f6556f77f-mr2dx"} 6.06142464e+08 1662638503859
container_memory_working_set_bytes{container="test-sidekiq-export",namespace="voidnew",pod="test-sidekiq-export-557dd4cbf6-l955f"} 3.58084608e+08 1662638511639
container_memory_working_set_bytes{container="test-sidekiq-mailer",namespace="bart",pod="test-sidekiq-mailer-56f59cdcc6-zhhqt"} 3.61381888e+08 1662638497654
container_memory_working_set_bytes{container="test-sidekiq-mailer",namespace="voidnew",pod="test-sidekiq-mailer-8576d8fdd9-94bbt"} 3.6489216e+08 1662638501244
container_memory_working_set_bytes{container="test-sidekiq-page-export",namespace="bart",pod="test-sidekiq-page-export-544fcb6bc-c4hpk"} 3.70192384e+08 1662638509405
container_memory_working_set_bytes{container="test-sidekiq-page-export",namespace="voidnew",pod="test-sidekiq-page-export-7b9cdd66c9-d4dlg"} 3.95972608e+08 1662638497931
container_memory_working_set_bytes{container="test-sidekiq-reporting",namespace="bart",pod="test-sidekiq-reporting-577864846-jfnwx"} 3.48819456e+08 1662638506392
container_memory_working_set_bytes{container="test-sidekiq-reporting",namespace="voidnew",pod="test-sidekiq-reporting-76c8bd94fb-nh7ct"} 3.26725632e+08 1662638511944
container_memory_working_set_bytes{container="test-sidekiq-support",namespace="bart",pod="test-sidekiq-support-6bdb69567d-gxnwv"} 3.33975552e+08 1662638507206
container_memory_working_set_bytes{container="test-sidekiq-support",namespace="voidnew",pod="test-sidekiq-support-57dbb94478-t8x69"} 3.49630464e+08 1662638511717
container_memory_working_set_bytes{container="test-sidekiq-webhook",namespace="bart",pod="test-sidekiq-webhook-c4b779697-khmcg"} 3.43396352e+08 1662638503264
container_memory_working_set_bytes{container="test-sidekiq-webhook",namespace="voidnew",pod="test-sidekiq-webhook-54687bf855-45jj8"} 3.424256e+08 1662638507288
# HELP node_cpu_usage_seconds_total [ALPHA] Cumulative cpu time consumed by the node in core-seconds
# TYPE node_cpu_usage_seconds_total counter
node_cpu_usage_seconds_total 22426.988613964 1662638508463
# HELP node_memory_working_set_bytes [ALPHA] Current working set of the node in bytes
# TYPE node_memory_working_set_bytes gauge
node_memory_working_set_bytes 1.2950503424e+10 1662638508463
# HELP pod_cpu_usage_seconds_total [ALPHA] Cumulative cpu time consumed by the pod in core-seconds
# TYPE pod_cpu_usage_seconds_total counter
pod_cpu_usage_seconds_total{namespace="bart",pod="test-admin-5b47b858f4-88cmt"} 15.499669223 1662638511093
pod_cpu_usage_seconds_total{namespace="bart",pod="test-sidekiq-api-stats-6c5b94756c-fchqw"} 104.654907175 1662638501088
pod_cpu_usage_seconds_total{namespace="bart",pod="test-sidekiq-default-6b5b648b85-nfv6v"} 401.04078175 1662638500826
pod_cpu_usage_seconds_total{namespace="bart",pod="test-sidekiq-mailer-56f59cdcc6-zhhqt"} 515.771736458 1662638507253
pod_cpu_usage_seconds_total{namespace="bart",pod="test-sidekiq-page-export-544fcb6bc-c4hpk"} 214.154657375 1662638506045
pod_cpu_usage_seconds_total{namespace="bart",pod="test-sidekiq-reporting-577864846-jfnwx"} 293.230341888 1662638495754
pod_cpu_usage_seconds_total{namespace="bart",pod="test-sidekiq-support-6bdb69567d-gxnwv"} 152.754955701 1662638500843
pod_cpu_usage_seconds_total{namespace="bart",pod="test-sidekiq-webhook-c4b779697-khmcg"} 189.489380615 1662638503680
pod_cpu_usage_seconds_total{namespace="kube-system",pod="aws-node-npt8s"} 312.706103434 1662638501756
pod_cpu_usage_seconds_total{namespace="kube-system",pod="aws-node-termination-handler-vskng"} 41.653263798 1662638501680
pod_cpu_usage_seconds_total{namespace="kube-system",pod="kube-proxy-5bb56"} 35.150185947 1662638505392
pod_cpu_usage_seconds_total{namespace="newrelic",pod="newrelic-newrelic-logging-8ztfv"} 667.08169285 1662638507505
pod_cpu_usage_seconds_total{namespace="newrelic",pod="newrelic-nrk8s-kubelet-bvjsd"} 1227.605221778 1662638504474
pod_cpu_usage_seconds_total{namespace="voidnew",pod="test-5799b7b769-hncfb"} 972.175565237 1662638502790
pod_cpu_usage_seconds_total{namespace="voidnew",pod="test-admin-659cf556cc-qclcz"} 599.177898867 1662638502685
pod_cpu_usage_seconds_total{namespace="voidnew",pod="test-shoryuken-6bb9c696c6-lqdl4"} 45.27940431 1662638505950
pod_cpu_usage_seconds_total{namespace="voidnew",pod="test-sidekiq-api-stats-676d9bcf67-m2429"} 105.305951433 1662638507380
pod_cpu_usage_seconds_total{namespace="voidnew",pod="test-sidekiq-default-7f6556f77f-mr2dx"} 686.245633742 1662638506965
pod_cpu_usage_seconds_total{namespace="voidnew",pod="test-sidekiq-export-557dd4cbf6-l955f"} 243.701396206 1662638504528
pod_cpu_usage_seconds_total{namespace="voidnew",pod="test-sidekiq-mailer-8576d8fdd9-94bbt"} 449.349414714 1662638502155
pod_cpu_usage_seconds_total{namespace="voidnew",pod="test-sidekiq-page-export-7b9cdd66c9-d4dlg"} 226.830506739 1662638509789
pod_cpu_usage_seconds_total{namespace="voidnew",pod="test-sidekiq-reporting-76c8bd94fb-nh7ct"} 333.764302034 1662638503290
pod_cpu_usage_seconds_total{namespace="voidnew",pod="test-sidekiq-support-57dbb94478-t8x69"} 133.610564259 1662638500707
pod_cpu_usage_seconds_total{namespace="voidnew",pod="test-sidekiq-webhook-54687bf855-45jj8"} 225.202388868 1662638507478
# HELP pod_memory_working_set_bytes [ALPHA] Current working set of the pod in bytes
# TYPE pod_memory_working_set_bytes gauge
pod_memory_working_set_bytes{namespace="bart",pod="test-admin-5b47b858f4-88cmt"} 1.329102848e+09 1662638511093
pod_memory_working_set_bytes{namespace="bart",pod="test-sidekiq-api-stats-6c5b94756c-fchqw"} 3.42798336e+08 1662638501088
pod_memory_working_set_bytes{namespace="bart",pod="test-sidekiq-default-6b5b648b85-nfv6v"} 3.48680192e+08 1662638500826
pod_memory_working_set_bytes{namespace="bart",pod="test-sidekiq-mailer-56f59cdcc6-zhhqt"} 3.6145152e+08 1662638507253
pod_memory_working_set_bytes{namespace="bart",pod="test-sidekiq-page-export-544fcb6bc-c4hpk"} 3.7117952e+08 1662638506045
pod_memory_working_set_bytes{namespace="bart",pod="test-sidekiq-reporting-577864846-jfnwx"} 3.5037184e+08 1662638495754
pod_memory_working_set_bytes{namespace="bart",pod="test-sidekiq-support-6bdb69567d-gxnwv"} 3.34823424e+08 1662638500843
pod_memory_working_set_bytes{namespace="bart",pod="test-sidekiq-webhook-c4b779697-khmcg"} 3.44469504e+08 1662638503680
pod_memory_working_set_bytes{namespace="kube-system",pod="aws-node-npt8s"} 5.255168e+07 1662638501756
pod_memory_working_set_bytes{namespace="kube-system",pod="aws-node-termination-handler-vskng"} 1.4897152e+07 1662638501680
pod_memory_working_set_bytes{namespace="kube-system",pod="kube-proxy-5bb56"} 2.4203264e+07 1662638505392
pod_memory_working_set_bytes{namespace="newrelic",pod="newrelic-newrelic-logging-8ztfv"} 3.5241984e+07 1662638507505
pod_memory_working_set_bytes{namespace="newrelic",pod="newrelic-nrk8s-kubelet-bvjsd"} 5.3981184e+07 1662638504474
pod_memory_working_set_bytes{namespace="voidnew",pod="test-5799b7b769-hncfb"} 2.491006976e+09 1662638502790
pod_memory_working_set_bytes{namespace="voidnew",pod="test-admin-659cf556cc-qclcz"} 2.561667072e+09 1662638502685
pod_memory_working_set_bytes{namespace="voidnew",pod="test-shoryuken-6bb9c696c6-lqdl4"} 3.32480512e+08 1662638505950
pod_memory_working_set_bytes{namespace="voidnew",pod="test-sidekiq-api-stats-676d9bcf67-m2429"} 3.3452032e+08 1662638507380
pod_memory_working_set_bytes{namespace="voidnew",pod="test-sidekiq-default-7f6556f77f-mr2dx"} 6.06650368e+08 1662638506965
pod_memory_working_set_bytes{namespace="voidnew",pod="test-sidekiq-export-557dd4cbf6-l955f"} 3.59264256e+08 1662638504528
pod_memory_working_set_bytes{namespace="voidnew",pod="test-sidekiq-mailer-8576d8fdd9-94bbt"} 3.66575616e+08 1662638502155
pod_memory_working_set_bytes{namespace="voidnew",pod="test-sidekiq-page-export-7b9cdd66c9-d4dlg"} 3.9723008e+08 1662638509789
pod_memory_working_set_bytes{namespace="voidnew",pod="test-sidekiq-reporting-76c8bd94fb-nh7ct"} 3.25435392e+08 1662638503290
pod_memory_working_set_bytes{namespace="voidnew",pod="test-sidekiq-support-57dbb94478-t8x69"} 3.50785536e+08 1662638500707
pod_memory_working_set_bytes{namespace="voidnew",pod="test-sidekiq-webhook-54687bf855-45jj8"} 3.43441408e+08 1662638507478
# HELP scrape_error [ALPHA] 1 if there was an error while getting container metrics, 0 otherwise
# TYPE scrape_error gauge
scrape_error 0
I use actually 0.6.1 version of metrics-server, but the output for 0.5.1 is following:
sk@mac ~ % kubectl get --raw /api/v1/nodes/$NODE_NAME/proxy/stats/summary | jq '{cpu: .node.cpu, memory: .node.memory}'
{
"cpu": {
"time": "2022-09-08T12:05:59Z",
"usageNanoCores": 144794016,
"usageCoreNanoSeconds": 22477182020253
},
"memory": {
"time": "2022-09-08T12:05:59Z",
"availableBytes": 3352186880,
"usageBytes": 14265958400,
"workingSetBytes": 12950142976,
"rssBytes": 12782141440,
"pageFaults": 3177570,
"majorPageFaults": 924
}
}
and cgroups check
[root@ip-10-100-100-174 /]# mount | grep group
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
Hi, @serg4kostiuk
From the data obtained above, it looks normal. Could kubectl top pod -n bart
display pod metrics correctly?
sure, @yangjunmyfm192085
sk@mac ~ % kubectl top pod -n bart
W0909 14:12:14.449355 4909 top_pod.go:140] Using json format to get metrics. Next release will switch to protocol-buffers, switch early by passing --use-protocol-buffers flag
NAME CPU(cores) MEMORY(bytes)
aaa-web-88dc87987-m7p95 3m 1414Mi
aaa-web-88dc87987-sc25d 2m 1040Mi
admin-7c69c54b65-64lpx 3m 959Mi
admin-7c69c54b65-c2gbc 4m 975Mi
test-sidekiq-api-stats-6c5b94756c-cps8s 2m 335Mi
test-sidekiq-default-6b5b648b85-hd78g 4m 334Mi
test-sidekiq-export-777878665c-j7kh6 2m 345Mi
test-sidekiq-heavy-54c966b8fd-zclzs 2m 319Mi
test-sidekiq-mailer-56f59cdcc6-hnrvs 5m 347Mi
test-sidekiq-page-export-544fcb6bc-2rlrq 2m 322Mi
test-sidekiq-reporting-577864846-9gmks 3m 340Mi
test-sidekiq-support-6bdb69567d-q8nnv 2m 307Mi
test-sidekiq-webhook-c4b779697-cmws8 2m 324Mi
sk@mac ~ % kubectl top nodes
W0909 14:13:22.379838 4925 top_node.go:119] Using json format to get metrics. Next release will switch to protocol-buffers, switch early by passing --use-protocol-buffers flag
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
ip-10-100-100-208.ec2.internal 131m 1% 3411Mi 23%
ip-10-100-100-8.ec2.internal 238m 6% 3846Mi 58%
ip-10-100-100-81.ec2.internal 156m 1% 7801Mi 53%
ip-10-100-101-138.ec2.internal 95m 1% 1766Mi 12%
ip-10-100-101-241.ec2.internal 615m 7% 9177Mi 63%
plus a few new errors in the metrics server pod
http2: server connection error from 10.100.101.182:41370: connection error: PROTOCOL_ERROR
http2: server connection error from 10.100.101.182:41370: connection error: PROTOCOL_ERROR
http2: server connection error from 10.100.101.182:39542: connection error: PROTOCOL_ERROR
E0909 07:14:50.107099 1 scraper.go:140] "Failed to scrape node" err="Get \"https://10.100.100.15:10250/metrics/resource\": remote error: tls: internal error" node="ip-10-100-100-15.ec2.internal"
E0909 07:30:33.603372 1 scraper.go:140] "Failed to scrape node" err="Get \"https://10.100.100.15:10250/metrics/resource\": context deadline exceeded" node="ip-10-100-100-15.ec2.internal"
E0909 07:30:48.603828 1 scraper.go:140] "Failed to scrape node" err="Get \"https://10.100.100.15:10250/metrics/resource\": context deadline exceeded" node="ip-10-100-100-15.ec2.internal"
E0909 07:31:03.603037 1 scraper.go:140] "Failed to scrape node" err="Get \"https://10.100.100.15:10250/metrics/resource\": context deadline exceeded" node="ip-10-100-100-15.ec2.internal"
E0909 07:31:18.603641 1 scraper.go:140] "Failed to scrape node" err="Get \"https://10.100.100.15:10250/metrics/resource\": context deadline exceeded" node="ip-10-100-100-15.ec2.internal"
E0909 07:31:33.603760 1 scraper.go:140] "Failed to scrape node" err="Get \"https://10.100.100.15:10250/metrics/resource\": context deadline exceeded" node="ip-10-100-100-15.ec2.internal"
E0909 09:55:20.132726 1 scraper.go:140] "Failed to scrape node" err="Get \"https://10.100.100.81:10250/metrics/resource\": remote error: tls: internal error" node="ip-10-100-100-81.ec2.internal"
E0909 10:01:05.127457 1 scraper.go:140] "Failed to scrape node" err="Get \"https://10.100.101.83:10250/metrics/resource\": dial tcp 10.100.101.83:10250: connect: connection refused" node="ip-10-100-101-83.ec2.internal"
E0909 10:01:33.603429 1 scraper.go:140] "Failed to scrape node" err="Get \"https://10.100.101.83:10250/metrics/resource\": context deadline exceeded" node="ip-10-100-101-83.ec2.internal"
E0909 10:01:48.603599 1 scraper.go:140] "Failed to scrape node" err="Get \"https://10.100.101.83:10250/metrics/resource\": context deadline exceeded" node="ip-10-100-101-83.ec2.internal"
my server-metrics deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: metrics-server
namespace: kube-system
labels:
k8s-app: metrics-server
spec:
strategy:
rollingUpdate:
maxUnavailable: 0
selector:
matchLabels:
k8s-app: metrics-server
template:
metadata:
labels:
k8s-app: metrics-server
spec:
serviceAccountName: metrics-server
volumes:
# mount in tmp so we can safely use from-scratch images and/or read-only containers
- name: tmp-dir
emptyDir: { }
priorityClassName: system-cluster-critical
containers:
- name: metrics-server
image: k8s.gcr.io/metrics-server/metrics-server:v0.6.1
imagePullPolicy: IfNotPresent
args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
- --kubelet-use-node-status-port
- --metric-resolution=15s
- --kubelet-insecure-tls=true
resources:
requests:
cpu: 200m
memory: 300Mi
ports:
- name: https
containerPort: 4443
protocol: TCP
readinessProbe:
httpGet:
path: /readyz
port: https
scheme: HTTPS
periodSeconds: 10
failureThreshold: 3
initialDelaySeconds: 20
livenessProbe:
httpGet:
path: /livez
port: https
scheme: HTTPS
failureThreshold: 3
periodSeconds: 10
securityContext:
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
allowPrivilegeEscalation: false
volumeMounts:
- name: tmp-dir
mountPath: /tmp
hostNetwork: true
nodeSelector:
kubernetes.io/os: linux
Thanks for the feedback. metrics-server appears to be in normal state. Is there still any problem with hpa?
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
E0909 10:01:05.127457 1 scraper.go:140] "Failed to scrape node" err="Get \"https://10.100.101.83:10250/metrics/resource\": dial tcp 10.100.101.83:10250: connect: connection refused" node="ip-10-100-101-83.ec2.internal"
@serg4kostiuk Are you positive that 10250
is exposed?
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
I am having a similar issue, but it seems to me that the metrics-server
is actually looking for the wrong (non-existent) nodes. From where it should take the list of nodes
to scrape? kubelet
service?
hello there, i'm having kind of the same issue. I have these logs
PS C:\windows\system32> kubectl logs -f metrics-server-58b7f877fc-67txx -n kube-system I0315 10:08:23.309378 1 serving.go:342] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key) I0315 10:08:23.681150 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController I0315 10:08:23.681186 1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController I0315 10:08:23.681160 1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file" I0315 10:08:23.681197 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file I0315 10:08:23.681161 1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file" I0315 10:08:23.681304 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I0315 10:08:23.681454 1 secure_serving.go:266] Serving securely on [::]:10250 W0315 10:08:23.681489 1 shared_informer.go:372] The sharedIndexInformer has started, run more than once is not allowed I0315 10:08:23.681525 1 dynamic_serving_content.go:131] "Starting controller" name="serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key" I0315 10:08:23.681535 1 tlsconfig.go:240] "Starting DynamicServingCertificateController" I0315 10:08:23.781829 1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController I0315 10:08:23.781854 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file I0315 10:08:23.781842 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I0315 10:08:42.689261 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve" I0315 10:08:52.689153 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve" E0315 10:08:53.688761 1 scraper.go:140] "Failed to scrape node" err="Get "https://10.49.161.59:10250/metrics/resource": dial tcp 10.49.161.59:10250: i/o timeout" node="ip-10-49-161-59.eu-west-3.compute.internal" E0315 10:08:53.691870 1 scraper.go:140] "Failed to scrape node" err="Get "https://10.49.161.177:10250/metrics/resource": dial tcp 10.49.161.177:10250: i/o timeout" node="ip-10-49-161-177.eu-west-3.compute.internal" E0315 10:08:53.692942 1 scraper.go:140] "Failed to scrape node" err="Get "https://10.49.161.214:10250/metrics/resource": dial tcp 10.49.161.214:10250: i/o timeout" node="ip-10-49-161-214.eu-west-3.compute.internal" E0315 10:08:53.708202 1 scraper.go:140] "Failed to scrape node" err="Get "https://10.49.161
metrics-server doesn't seem to see the other nodes in the cluster although they all have the same configuration and they all have the port 10250 TCP configured on the sg
PS C:\windows\system32> kubectl top nodes -n kube-system
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
ip-10-49-161-64.eu-west-3.compute.internal 55m 1% 1959Mi 13%
ip-10-49-161-125.eu-west-3.compute.internal
I just wanna add that we're running metrics-server 0.6.1 and EKS 1.25. I've already applied all the hacks and workarrounds mentioned, like metric-resolution; preferred address-types , --kubelet-insecure-tls=true and none of them help solve the issue. please anyone here to help??
@sichiba try this:
helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server/
helm install metrics-server \
metrics-server/metrics-server \
-n kube-system \
--version 3.9.0 \
-f metrics-server/values.yaml \
--wait
# Fixes for error:
# "couldn't get current server API group list: the server has asked for the client to provide credentials"
# inspired by https://stackoverflow.com/a/75266326
# related discussion https://github.com/kubernetes-sigs/metrics-server/issues/157
hostNetwork:
enabled: true
args:
- --kubelet-insecure-tls
at least this helped me.
I still get:
couldn't get resource list for metrics.k8s.io/v1beta1: Unauthorized
but it's not block
also having the same issue where metrics server is trying to scrape old nodes which have been terminated. It occasionally scrapes the new nodes and receives a remote error: tls: internal error
also having the same issue where metrics server is trying to scrape old nodes which have been terminated. It occasionally scrapes the new nodes and receives a
remote error: tls: internal error
Exactly the same for me. It seems to occur when a new node appears, then nothing