cluster-api-provider-azure Add Metrics server to clusters

/kind feature

Describe the solution you'd like [A clear and concise description of what you want to happen.] After deploying a cluster the commands kubectl top nodes and kubectl top pods don't work because the metric server is missing. Functionality like the HPA will also require the metric server.

 k top nodes
error: Metrics API not available
 k top pods
error: Metrics API not available
 k get pods -A
NAMESPACE     NAME                                                           READY   STATUS    RESTARTS   AGE
kube-system   calico-kube-controllers-8f59968d4-v8clz                        1/1     Running   0          60m
kube-system   calico-node-clfb2                                              1/1     Running   0          60m
kube-system   calico-node-kdv67                                              1/1     Running   0          58m
kube-system   calico-node-n542l                                              1/1     Running   0          57m
kube-system   coredns-f9fd979d6-gf6hr                                        1/1     Running   0          60m
kube-system   coredns-f9fd979d6-h2m7m                                        1/1     Running   0          60m
kube-system   etcd-default-template-control-plane-9przl                      1/1     Running   0          60m
kube-system   kube-apiserver-default-template-control-plane-9przl            1/1     Running   0          60m
kube-system   kube-controller-manager-default-template-control-plane-9przl   1/1     Running   0          60m
kube-system   kube-proxy-4pb22                                               1/1     Running   0          60m
kube-system   kube-proxy-qf5v7                                               1/1     Running   0          57m
kube-system   kube-proxy-xqrvw                                               1/1     Running   0          58m
kube-system   kube-scheduler-default-template-control-plane-9przl            1/1     Running   0          60m

 kubectl describe hpa
Name:                                                  php-apache
Namespace:                                             default
Labels:                                                <none>
Annotations:                                           <none>
CreationTimestamp:                                     Fri, 15 Jan 2021 13:17:36 -0800
Reference:                                             Deployment/php-apache
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  <unknown> / 50%
Min replicas:                                          1
Max replicas:                                          10
Deployment pods:                                       1 current / 0 desired
Conditions:
  Type           Status  Reason                   Message
  ----           ------  ------                   -------
  AbleToScale    True    SucceededGetScale        the HPA controller was able to get the target's current scale
  ScalingActive  False   FailedGetResourceMetric  the HPA was unable to compute the replica count: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)
Events:
  Type     Reason                        Age                From                       Message
  ----     ------                        ----               ----                       -------
  Warning  FailedGetResourceMetric       14s (x4 over 60s)  horizontal-pod-autoscaler  unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)
  Warning  FailedComputeMetricsReplicas  14s (x4 over 60s)  horizontal-pod-autoscaler  invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

This could be applied via Custom Resource Sets similar to the way we do CNI's: https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/e7ebdf3bef11fcb5a6e001b2860ae8d6654e0202/Tiltfile#L199

https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/master/templates/addons/calico-resource-set.yaml

Environment:

cluster-api-provider-azure version:
Kubernetes version: (use kubectl version):
OS (e.g. from /etc/os-release):

Jan 15 '21 21:01 jsturtevant

We can do this easily on all dev clusters created by e2e and through the makefile, and document how to do this for workload clusters (either run kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml on each cluster or use a ClusterResourceSet to automatically apply it to every workload cluster). However, same as CNIs, it's up to the user to create the CRS. We can provide documentation and examples, but the CRS won't be created automatically by clusterctl.

Jan 15 '21 21:01 CecileRobertMichon

In my tests the vanilla metrics-server config doesn't work. I have gotten a v0.3.7 spec working correctly, using the --kubelet-insecure-tls and --kubelet-preferred-address-types=InternalIP command args, and using a spec.insecureSkipTLSVerify: true config to the APIService configuration.

Apr 01 '21 20:04 jackfrancis

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

Jun 30 '21 20:06 fejta-bot

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten

Jul 30 '21 21:07 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Aug 29 '21 21:08 k8s-triage-robot

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Aug 29 '21 21:08 k8s-ci-robot

/reopen

Dec 01 '21 20:12 jsturtevant

@jsturtevant: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Dec 01 '21 20:12 k8s-ci-robot

/assign

Dec 01 '21 21:12 jsturtevant

/reopen

This is only partially fixed for e2e and the the insecure flag needs to be addressed via updates to capi or manually be a customer that wishes to deploy it. See for more details https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/1903#issuecomment-989350818

Dec 11 '21 00:12 jsturtevant

@jsturtevant: Reopened this issue.

In response to this:

/reopen

This is only partially fixed for e2e and the the insecure flag needs to be addressed via updates to capi or manually be a customer that wishes to deploy it. See for more details https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/1903#issuecomment-989350818

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Dec 11 '21 00:12 k8s-ci-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Jan 10 '22 01:01 k8s-triage-robot

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Jan 10 '22 01:01 k8s-ci-robot

/reopen

Jan 10 '22 02:01 jsturtevant

@jsturtevant: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Jan 10 '22 02:01 k8s-ci-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Feb 09 '22 02:02 k8s-triage-robot

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Feb 09 '22 02:02 k8s-ci-robot

/reopen

This is only partially fixed for e2e and the the insecure flag needs to be addressed via updates to capi or manually be a customer that wishes to deploy it. See for more details https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/1903#issuecomment-989350818

Feb 09 '22 19:02 jsturtevant

@jsturtevant: Reopened this issue.

In response to this:

/reopen

This is only partially fixed for e2e and the the insecure flag needs to be addressed via updates to capi or manually be a customer that wishes to deploy it. See for more details https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/1903#issuecomment-989350818

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Feb 09 '22 19:02 k8s-ci-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Mar 11 '22 20:03 k8s-triage-robot

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Mar 11 '22 20:03 k8s-ci-robot

/reopen /remove-lifecycle rotten

Mar 11 '22 21:03 jsturtevant

@jsturtevant: Reopened this issue.

In response to this:

/reopen /remove-lifecycle rotten

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Mar 11 '22 21:03 k8s-ci-robot

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jun 09 '22 21:06 k8s-triage-robot

/remove-lifecycle stale using helm to deploy might solve this problem

Jun 10 '22 15:06 jsturtevant

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Sep 08 '22 16:09 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Oct 08 '22 16:10 k8s-triage-robot

/remove-lifecycle rotten

Oct 14 '22 18:10 jackfrancis

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 12 '23 18:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Feb 11 '23 19:02 k8s-triage-robot

cluster-api-provider-azure cluster-api-provider-azure copied to clipboard

Add Metrics server to clusters

cluster-api-provider-azure
cluster-api-provider-azure copied to clipboard