helm-charts icon indicating copy to clipboard operation
helm-charts copied to clipboard

[kube-prometheus-stack] kube-scheduler and kube-controller-manager monitor not working

Open bmgante opened this issue 2 years ago • 12 comments

Hi,

EKS 1.25 and cannot get metrics from kube-scheduler and kube-controller-manager. Below values.yaml for kube-scheduler (similar for kube-controller-manager).

## Component scraping kube scheduler
##
kubeScheduler:
  enabled: true

  ## If your kube scheduler is not deployed as a pod, specify IPs it can be found on
  ##
  endpoints: []
  # - 10.141.4.22
  # - 10.141.4.23
  # - 10.141.4.24

  ## If using kubeScheduler.endpoints only the port and targetPort are used
  ##
  service:
    enabled: true
    ## If null or unset, the value is determined dynamically based on target Kubernetes version due to change
    ## of default port in Kubernetes 1.23.
    ##
    port: null
    targetPort: null
    # selector:
    #   component: kube-scheduler

  serviceMonitor:
    enabled: true
    ## Scrape interval. If not set, the Prometheus default scrape interval is used.
    ##
    interval: ""

    ## SampleLimit defines per-scrape limit on number of scraped samples that will be accepted.
    ##
    sampleLimit: 0

    ## TargetLimit defines a limit on the number of scraped targets that will be accepted.
    ##
    targetLimit: 0

    ## Per-scrape limit on number of labels that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
    ##
    labelLimit: 0

    ## Per-scrape limit on length of labels name that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
    ##
    labelNameLengthLimit: 0

    ## Per-scrape limit on length of labels value that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
    ##
    labelValueLengthLimit: 0

    ## proxyUrl: URL of a proxy that should be used for scraping.
    ##
    proxyUrl: ""
    ## Enable scraping kube-scheduler over https.
    ## Requires proper certs (not self-signed) and delegated authentication/authorization checks.
    ## If null or unset, the value is determined dynamically based on target Kubernetes version.
    ##
    https: null

    ## Skip TLS certificate validation when scraping
    insecureSkipVerify: null

    ## Name of the server to use when validating TLS certificate
    serverName: null

    ## MetricRelabelConfigs to apply to samples after scraping, but before ingestion.
    ## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
    ##
    metricRelabelings: []
    # - action: keep
    #   regex: 'kube_(daemonset|deployment|pod|namespace|node|statefulset).+'
    #   sourceLabels: [__name__]

    ## RelabelConfigs to apply to samples before scraping
    ## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
    ##
    relabelings: []
    # - sourceLabels: [__meta_kubernetes_pod_node_name]
    #   separator: ;
    #   regex: ^(.*)$
    #   targetLabel: nodename
    #   replacement: $1
    #   action: replace

    ## Additional labels
    ##
    additionalLabels: {}
    #  foo: bar

Servicemonitor created by helm-chart:

% kubectl get servicemonitor prometheus-kube-prometheus-kube-scheduler -o yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  annotations:
    meta.helm.sh/release-name: prometheus
    meta.helm.sh/release-namespace: monitoring
  creationTimestamp: "2023-04-10T17:02:45Z"
  generation: 1
  labels:
    app: kube-prometheus-stack-kube-scheduler
    app.kubernetes.io/instance: prometheus
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/part-of: kube-prometheus-stack
    app.kubernetes.io/version: 45.7.1
    chart: kube-prometheus-stack-45.7.1
    heritage: Helm
    release: prometheus
  name: prometheus-kube-prometheus-kube-scheduler
  namespace: monitoring
  resourceVersion: "6339940"
  uid: 85c428a8-dee8-4a29-a122-4770d2498099
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    port: http-metrics
    scheme: https
    tlsConfig:
      caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      insecureSkipVerify: true
  jobLabel: jobLabel
  namespaceSelector:
    matchNames:
    - kube-system
  selector:
    matchLabels:
      app: kube-prometheus-stack-kube-scheduler
      release: prometheus

SVC created by helm-chart:

% kubectl get svc prometheus-kube-prometheus-kube-scheduler -n kube-system
NAME                                        TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)     AGE
prometheus-kube-prometheus-kube-scheduler   ClusterIP   None         <none>        10259/TCP   30d

bmgante avatar May 11 '23 15:05 bmgante

I guess the problem are the endpoints which were empty because kube-scheduler and kube-controller-manager are not pods. Then, i tried to specify the IPs of the EKS instances but prometheus scrapping was failing. Tried also to change the endpoint for kube-scheduler for lease holder 10.0.105.9 but scrape fails as well with "Get "https://10.0.105.9:10259/metrics": context deadline exceeded".

# kubectl get endpoints -n kube-system
....
prometheus-kube-prometheus-kube-controller-manager   <none>                                                                     30d
prometheus-kube-prometheus-kube-etcd                 <none>                                                                     30d
prometheus-kube-prometheus-kube-scheduler            10.0.105.9:10259                                                           9m9s
...

When setting endpoints to the ips of eks worker nodes, the error is Get "https://x.x.x.x:10259/metrics": dial tcp 172.27.172.254:10259: connect: connection refused.

Any idea on how to address this or isn't it possible at all to monitor the services and should just disable them in values.yaml?

bmgante avatar May 11 '23 17:05 bmgante

@bmgante can you access the scheduler metrics endpoint from a container in the cluster (create a container in any namespace and try a curl)?

QuentinBisson avatar May 23 '23 20:05 QuentinBisson

Managed Kubernetes services do not generally make control plane's metrics endpoints accessible to customers, except for kube-api-server. This is also true for EKS (to provide at least some important scheduler's metrics, EKS planned to make them available through Cloudwatch).

zeritti avatar May 25 '23 21:05 zeritti

Ok thanks. I’ve just disabled that monitoring on values.yaml to avoid having alerts.

zeritti @.***> escreveu em qui., 25/05/2023 às 22:11 :

Managed Kubernetes services do not generally make control plane's metrics endpoints accessible to customers, except for kube-api-server. This is also true for EKS (to provide at least some important scheduler's metrics, EKS planned to make them available through Cloudwatch).

— Reply to this email directly, view it on GitHub https://github.com/prometheus-community/helm-charts/issues/3368#issuecomment-1563510980, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGQXMYJTYT2H4RINPRNOFNLXH7DHNANCNFSM6AAAAAAX6KBVSI . You are receiving this because you were mentioned.Message ID: @.***>

bmgante avatar May 25 '23 21:05 bmgante

@bmgante Could you share the update that you had to do to the values.yaml to achieve the disabling of those 2 alerts? I tried using this:

defaultRules:
  disabled:
    Watchdog: true
    KubeControllerManagerDown: true
    KubeSchedulerDown: true

but it failed with this when I tried to apply that update:

Error: error validating "": error validating data: ValidationError(PrometheusRule.spec.groups[0]): missing required field "rules" in com.coreos.monitoring.v1.PrometheusRule.spec.groups

Thanks!

diego-ojeda-binbash avatar Jun 28 '23 02:06 diego-ojeda-binbash

Hi @diego-ojeda-binbash I think it was just this:

## Component scraping kube scheduler
##
kubeScheduler:
  enabled: false
## Component scraping kube scheduler
##
kubeScheduler:
  enabled: false
## Create default rules for monitoring the cluster
##
defaultRules:
  create: true
  rules:
    alertmanager: true
    etcd: true
    configReloaders: true
    general: true
    k8s: true
    kubeApiserverAvailability: true
    kubeApiserverBurnrate: true
    kubeApiserverHistogram: true
    kubeApiserverSlos: true
    kubeControllerManager: false
    kubelet: true
    kubeProxy: true
    kubePrometheusGeneral: true
    kubePrometheusNodeRecording: true
    kubernetesApps: true
    kubernetesResources: true
    kubernetesStorage: true
    kubernetesSystem: true
    kubeSchedulerAlerting: false
    kubeSchedulerRecording: false
    kubeStateMetrics: true
    network: true
    node: true
    nodeExporterAlerting: true
    nodeExporterRecording: true
    prometheus: true
    prometheusOperator: true

bmgante avatar Jun 28 '23 15:06 bmgante

I assume service selector does not match ... maybe because of old version of kubernetes ...

selector:
    component: kube-scheduler

when the real label assigned to scheduler pod is k8s-app=kube-scheduler

admitriev-ppro avatar Jul 05 '23 15:07 admitriev-ppro

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] avatar Aug 07 '23 05:08 stale[bot]

This workaround should normally solve the problem, if you still want to monitor kube-scheduler and kube-controller-manager : https://github.com/prometheus-community/helm-charts/issues/3368#issuecomment-1563510980

JulesLalu avatar Sep 14 '23 08:09 JulesLalu

## Create default rules for monitoring the cluster
##
defaultRules:
  create: true
  rules:
    ...

Any idea where the documentation for each of these rules is? I can see they are all being used here https://github.com/prometheus-community/helm-charts/blob/11127a45423d6cf468e476e9ee5a800b7a6c29af/charts/kube-prometheus-stack/hack/sync_prometheus_rules.py but I can't figure out the meaning of some of them.

rudolfbyker avatar Dec 04 '23 10:12 rudolfbyker

This should actually be included in the documentation. I had to jump through issues to find this.

Daemoen avatar Jul 01 '24 00:07 Daemoen

My setup with microk8s had the kube-scheduler, kube-controller-manager, and kube-proxy alerts firing. I had to disable them via these Helm chart values:

values:
  kubeControllerManager:
    enabled: false
  kubeScheduler:
    enabled: false
  kubeProxy:
    enabled: false

I tried setting the endpoint values as described in the microk8s docs but it didn't work.

cristianrgreco avatar Aug 16 '24 14:08 cristianrgreco

FYI, for EKS the metrics are exposed even though the scheduler and controller manager are in the EKS control plane and not in pods. Ref: https://docs.aws.amazon.com/eks/latest/userguide/view-raw-metrics.html#deploy-prometheus-scraper

I need to dig into this more. When I enabled the EKS metrics, I ended up with duplicate metrics coming from "service" kube-prometheus-stack-kubelet, so I'm guessing that kube-prometheus is exposing the EKS metrics another way.

The metrics coming back from EKS are much richer, with lots of details about each node; I doubt most of those are useful from a metrics perspective.

From EKS:

kubelet_running_pods{beta_kubernetes_io_arch="amd64", beta_kubernetes_io_instance_type="m5.xlarge", beta_kubernetes_io_os="linux", eks_amazonaws_com_capacityType="ON_DEMAND", eks_amazonaws_com_nodegroup="cpu-node-group-plt-20250218143836696000000001", eks_amazonaws_com_nodegroup_image="ami-070ee37f2c1386fd6", eks_amazonaws_com_sourceLaunchTemplateId="lt-070c446cb737f34e7", eks_amazonaws_com_sourceLaunchTemplateVersion="13", failure_domain_beta_kubernetes_io_region="us-east-1", failure_domain_beta_kubernetes_io_zone="us-east-1c", instance="ip-100-96-5-234.ec2.internal", job="kubernetes-nodes", k8s_io_cloud_provider_aws="e1302bd65772c17c5fbf3344a12c2066", kubernetes_io_arch="amd64", kubernetes_io_hostname="ip-100-96-5-234.ec2.internal", kubernetes_io_os="linux", node_kubernetes_io_instance_type="m5.xlarge", topology_ebs_csi_aws_com_zone="us-east-1c", topology_k8s_aws_zone_id="use1-az1", topology_kubernetes_io_region="us-east-1", topology_kubernetes_io_zone="us-east-1c"} | 26

From kube-prometheus-stack-kubelet:

kubelet_running_pods{endpoint="https-metrics", instance="100.96.0.10:10250", job="kubelet", metrics_path="/metrics", namespace="kube-system", node="ip-100-96-0-10.ec2.internal", service="kube-prometheus-stack-kubelet"} | 25

PT-GD avatar Feb 22 '25 02:02 PT-GD

So is there any other solution apart from:

  • disable the metrics as you guys mentioned above
  • Change bind address to 0.0.0.0

Although there are security concerns for the latter and not recommended. I believe there is a way either with certs supplied similar to etcd approach (or privileged role with sa or something) or through a proxy like haproxy or in my case as I am using rancher, it needs to be done through pushprox

xakaitetoia avatar Mar 05 '25 20:03 xakaitetoia

I just deployed the chart on my k3s cluster, and I think the problem is because the Endpoint is actually created on the current set namespace, which is commonly "monitoring", but the namespaceSelector on ServiceMonitor is always pointed to kube-system. The Endpoint and Service supposed to be created on kube-system namespace. But since I use kustomize and set the namespace to monitoring, then all of the namespace value being replaced into monitoring.

rulim34 avatar Mar 07 '25 16:03 rulim34

@rulim34, that's not why. The pod's IP address is bound to localhost, so other pods can't reach it. To resolve the issue in clusters like Kind where you an exec into the control plane node, edit the schedulers or controller-managers manifest.

For example, for the scheduler, do this.

vi /etc/kubernetes/manifests/kube-scheduler.yaml

spec:
  containers:
  - command:
    - kube-scheduler
    - --bind-address=0.0.0.0  # Allow listening on all interfaces

After saving the file, prometheus should be able to scrape the metric.

Image

Normally, you wouldn't do this or else you are maybe doing a test on your control plane, e.g., a performance test using a tool like KWOK.

network-charles avatar Mar 11 '25 21:03 network-charles

@rulim34, that's not why. The pod's IP address is bound to localhost, so other pods can't reach it. To resolve the issue in clusters like Kind where you an exec into the control plane node, edit the schedulers or controller-managers manifest.

For example, for the scheduler, do this.

vi /etc/kubernetes/manifests/kube-scheduler.yaml

spec: containers:

  • command:
    • kube-scheduler
    • --bind-address=0.0.0.0 # Allow listening on all interfaces After saving the file, prometheus should be able to scrape the metric.
Image Normally, you wouldn't do this or else you are maybe doing a test on your control plane, e.g., a performance test using a tool like [KWOK](https://github.com/kubernetes-sigs/kwok).

Thank you, this way helped me with scraping scheduler and controller as well

berezinsn avatar Apr 15 '25 08:04 berezinsn

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] avatar Jun 26 '25 23:06 stale[bot]

This issue is being automatically closed due to inactivity.

stale[bot] avatar Jul 18 '25 21:07 stale[bot]