metrics-server icon indicating copy to clipboard operation
metrics-server copied to clipboard

EKS 1.24 metric-server 503 error

Open afilonchuk opened this issue 1 year ago • 5 comments

What happened: Get en error controller.go:116] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable , Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]

Environment: EKS 1.24 + metrics-server:v0.7.0 + aws vpc-cni Major:"1", Minor:"27", GitVersion:"v1.27.2", GitCommit:"7f6f68fdabc4df88cfea2dcf9a19b2b830f1e647", GitTreeState:"clean", BuildDate:"2023-05-17T14:20:07Z", GoVersion:"go1.20.4", Compiler:"gc", Platform:"darwin/arm64"}

  • Kubernetes distribution (GKE, EKS, Kubeadm, the hard way, etc.):

  • Container Network Setup (flannel, calico, etc.):

  • Kubernetes version (use kubectl version):

  • Metrics Server manifest helm chart 3.12.0

  • Metrics server logs:

spoiler for Metrics Server logs:

14 I0301 11:50:48.516799 1 serving.go:374] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key) 13 I0301 11:50:49.134213 1 handler.go:275] Adding GroupVersion metrics.k8s.io v1beta1 to ResourceManager 12 I0301 11:50:49.265768 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController 11 I0301 11:50:49.265807 1 shared_informer.go:311] Waiting for caches to sync for RequestHeaderAuthRequestController 10 I0301 11:50:49.265965 1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file" 9 I0301 11:50:49.266000 1 shared_informer.go:311] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file 8 I0301 11:50:49.266085 1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file" 7 I0301 11:50:49.266132 1 shared_informer.go:311] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file 6 I0301 11:50:49.266457 1 secure_serving.go:213] Serving securely on [::]:4443 5 I0301 11:50:49.266567 1 dynamic_serving_content.go:132] "Starting controller" name="serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key" 4 I0301 11:50:49.266936 1 tlsconfig.go:240] "Starting DynamicServingCertificateController" 3 I0301 11:50:49.365943 1 shared_informer.go:318] Caches are synced for RequestHeaderAuthRequestController 2 I0301 11:50:49.366054 1 shared_informer.go:318] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file 1 I0301 11:50:49.366219 1 shared_informer.go:318] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file

  • Status of Metrics API:

Name: v1beta1.metrics.k8s.io Namespace: Labels: app.kubernetes.io/instance=metrics-server app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=metrics-server app.kubernetes.io/version=0.7.0 argocd.argoproj.io/instance=metrics-server helm.sh/chart=metrics-server-3.12.0 Annotations: <none> API Version: apiregistration.k8s.io/v1 Kind: APIService Metadata: Creation Timestamp: 2024-03-01T11:50:46Z Resource Version: 198063091 UID: 727329a3-9bb5-4e7c-9db5-111d17049c8d Spec: Group: metrics.k8s.io Group Priority Minimum: 100 Insecure Skip TLS Verify: true Service: Name: metrics-server Namespace: metrics-server Port: 443 Version: v1beta1 Version Priority: 100 Status: Conditions: Last Transition Time: 2024-03-01T11:51:16Z Message: all checks passed Reason: Passed Status: True Type: Available Events: <none> Screenshot 2024-03-01 at 15 08 39 Screenshot 2024-03-01 at 15 09 17 Screenshot 2024-03-01 at 15 09 57

/kind bug

afilonchuk avatar Mar 01 '24 12:03 afilonchuk

/assign @yangjunmyfm192085 /triage accepted

dashpole avatar Mar 07 '24 17:03 dashpole

hi, @afilonchuk , Are there any other error logs for metrics-server? From the currently provided log information, unable to analyze the cause.

It is also recommended to refer to https://github.com/kubernetes-sigs/metrics-server/blob/master/KNOWN_ISSUES.md#unable-to-work-properly-in-amazon-eks

yangjunmyfm192085 avatar Mar 08 '24 09:03 yangjunmyfm192085

@yangjunmyfm192085 Hello.

Just deployed Helm chart from scratch. Let me share all info I have for the moment again.

Screenshot 2024-03-15 at 16 10 20 Screenshot 2024-03-15 at 16 10 48 Screenshot 2024-03-15 at 16 13 45 Screenshot 2024-03-15 at 16 18 04

Also I've tried to create SA with EKS based over OIDC role and add it to Aws-auth configmap... no luck for the moment (((

Moreover I've updated EKS cluster from 1.24 to 1.25 and this also not helped (((

tg-sys avatar Mar 15 '24 13:03 tg-sys

Hi, tg-sys, It seems that metrics-server does not get data from kubelet normally. may be able to check in the following ways:

  1. Adjust the log level of metrics-server to 6 and check whether metrics-server can scrape node data normally.
  2. according to https://github.com/kubernetes-sigs/metrics-server/blob/master/KNOWN_ISSUES.md#kubelet-doesnt-report-metrics-for-all-or-subset-of-nodes, access the kubelet's api endpoint directly to see if the data can be scraped normally.

yangjunmyfm192085 avatar Mar 19 '24 11:03 yangjunmyfm192085

Screenshot 2024-03-29 at 17 48 10 @yangjunmyfm192085 I've tested Log level to 6 and looks like Metrics were grabbed... metric_server_log_6.txt

tg-sys avatar Mar 29 '24 14:03 tg-sys