[kube-prometheus-stack] kube-scheduler and kube-controller-manager monitor not working
Hi,
EKS 1.25 and cannot get metrics from kube-scheduler and kube-controller-manager. Below values.yaml for kube-scheduler (similar for kube-controller-manager).
## Component scraping kube scheduler
##
kubeScheduler:
enabled: true
## If your kube scheduler is not deployed as a pod, specify IPs it can be found on
##
endpoints: []
# - 10.141.4.22
# - 10.141.4.23
# - 10.141.4.24
## If using kubeScheduler.endpoints only the port and targetPort are used
##
service:
enabled: true
## If null or unset, the value is determined dynamically based on target Kubernetes version due to change
## of default port in Kubernetes 1.23.
##
port: null
targetPort: null
# selector:
# component: kube-scheduler
serviceMonitor:
enabled: true
## Scrape interval. If not set, the Prometheus default scrape interval is used.
##
interval: ""
## SampleLimit defines per-scrape limit on number of scraped samples that will be accepted.
##
sampleLimit: 0
## TargetLimit defines a limit on the number of scraped targets that will be accepted.
##
targetLimit: 0
## Per-scrape limit on number of labels that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
##
labelLimit: 0
## Per-scrape limit on length of labels name that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
##
labelNameLengthLimit: 0
## Per-scrape limit on length of labels value that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
##
labelValueLengthLimit: 0
## proxyUrl: URL of a proxy that should be used for scraping.
##
proxyUrl: ""
## Enable scraping kube-scheduler over https.
## Requires proper certs (not self-signed) and delegated authentication/authorization checks.
## If null or unset, the value is determined dynamically based on target Kubernetes version.
##
https: null
## Skip TLS certificate validation when scraping
insecureSkipVerify: null
## Name of the server to use when validating TLS certificate
serverName: null
## MetricRelabelConfigs to apply to samples after scraping, but before ingestion.
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
##
metricRelabelings: []
# - action: keep
# regex: 'kube_(daemonset|deployment|pod|namespace|node|statefulset).+'
# sourceLabels: [__name__]
## RelabelConfigs to apply to samples before scraping
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
##
relabelings: []
# - sourceLabels: [__meta_kubernetes_pod_node_name]
# separator: ;
# regex: ^(.*)$
# targetLabel: nodename
# replacement: $1
# action: replace
## Additional labels
##
additionalLabels: {}
# foo: bar
Servicemonitor created by helm-chart:
% kubectl get servicemonitor prometheus-kube-prometheus-kube-scheduler -o yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
annotations:
meta.helm.sh/release-name: prometheus
meta.helm.sh/release-namespace: monitoring
creationTimestamp: "2023-04-10T17:02:45Z"
generation: 1
labels:
app: kube-prometheus-stack-kube-scheduler
app.kubernetes.io/instance: prometheus
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/part-of: kube-prometheus-stack
app.kubernetes.io/version: 45.7.1
chart: kube-prometheus-stack-45.7.1
heritage: Helm
release: prometheus
name: prometheus-kube-prometheus-kube-scheduler
namespace: monitoring
resourceVersion: "6339940"
uid: 85c428a8-dee8-4a29-a122-4770d2498099
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
port: http-metrics
scheme: https
tlsConfig:
caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecureSkipVerify: true
jobLabel: jobLabel
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
app: kube-prometheus-stack-kube-scheduler
release: prometheus
SVC created by helm-chart:
% kubectl get svc prometheus-kube-prometheus-kube-scheduler -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
prometheus-kube-prometheus-kube-scheduler ClusterIP None <none> 10259/TCP 30d
I guess the problem are the endpoints which were empty because kube-scheduler and kube-controller-manager are not pods. Then, i tried to specify the IPs of the EKS instances but prometheus scrapping was failing. Tried also to change the endpoint for kube-scheduler for lease holder 10.0.105.9 but scrape fails as well with "Get "https://10.0.105.9:10259/metrics": context deadline exceeded".
# kubectl get endpoints -n kube-system
....
prometheus-kube-prometheus-kube-controller-manager <none> 30d
prometheus-kube-prometheus-kube-etcd <none> 30d
prometheus-kube-prometheus-kube-scheduler 10.0.105.9:10259 9m9s
...
When setting endpoints to the ips of eks worker nodes, the error is Get "https://x.x.x.x:10259/metrics": dial tcp 172.27.172.254:10259: connect: connection refused.
Any idea on how to address this or isn't it possible at all to monitor the services and should just disable them in values.yaml?
@bmgante can you access the scheduler metrics endpoint from a container in the cluster (create a container in any namespace and try a curl)?
Managed Kubernetes services do not generally make control plane's metrics endpoints accessible to customers, except for kube-api-server. This is also true for EKS (to provide at least some important scheduler's metrics, EKS planned to make them available through Cloudwatch).
Ok thanks. I’ve just disabled that monitoring on values.yaml to avoid having alerts.
zeritti @.***> escreveu em qui., 25/05/2023 às 22:11 :
Managed Kubernetes services do not generally make control plane's metrics endpoints accessible to customers, except for kube-api-server. This is also true for EKS (to provide at least some important scheduler's metrics, EKS planned to make them available through Cloudwatch).
— Reply to this email directly, view it on GitHub https://github.com/prometheus-community/helm-charts/issues/3368#issuecomment-1563510980, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGQXMYJTYT2H4RINPRNOFNLXH7DHNANCNFSM6AAAAAAX6KBVSI . You are receiving this because you were mentioned.Message ID: @.***>
@bmgante Could you share the update that you had to do to the values.yaml to achieve the disabling of those 2 alerts? I tried using this:
defaultRules:
disabled:
Watchdog: true
KubeControllerManagerDown: true
KubeSchedulerDown: true
but it failed with this when I tried to apply that update:
Error: error validating "": error validating data: ValidationError(PrometheusRule.spec.groups[0]): missing required field "rules" in com.coreos.monitoring.v1.PrometheusRule.spec.groups
Thanks!
Hi @diego-ojeda-binbash I think it was just this:
## Component scraping kube scheduler
##
kubeScheduler:
enabled: false
## Component scraping kube scheduler
##
kubeScheduler:
enabled: false
## Create default rules for monitoring the cluster
##
defaultRules:
create: true
rules:
alertmanager: true
etcd: true
configReloaders: true
general: true
k8s: true
kubeApiserverAvailability: true
kubeApiserverBurnrate: true
kubeApiserverHistogram: true
kubeApiserverSlos: true
kubeControllerManager: false
kubelet: true
kubeProxy: true
kubePrometheusGeneral: true
kubePrometheusNodeRecording: true
kubernetesApps: true
kubernetesResources: true
kubernetesStorage: true
kubernetesSystem: true
kubeSchedulerAlerting: false
kubeSchedulerRecording: false
kubeStateMetrics: true
network: true
node: true
nodeExporterAlerting: true
nodeExporterRecording: true
prometheus: true
prometheusOperator: true
I assume service selector does not match ... maybe because of old version of kubernetes ...
selector:
component: kube-scheduler
when the real label assigned to scheduler pod is k8s-app=kube-scheduler
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
This workaround should normally solve the problem, if you still want to monitor kube-scheduler and kube-controller-manager : https://github.com/prometheus-community/helm-charts/issues/3368#issuecomment-1563510980
## Create default rules for monitoring the cluster ## defaultRules: create: true rules: ...
Any idea where the documentation for each of these rules is? I can see they are all being used here https://github.com/prometheus-community/helm-charts/blob/11127a45423d6cf468e476e9ee5a800b7a6c29af/charts/kube-prometheus-stack/hack/sync_prometheus_rules.py but I can't figure out the meaning of some of them.
This should actually be included in the documentation. I had to jump through issues to find this.
My setup with microk8s had the kube-scheduler, kube-controller-manager, and kube-proxy alerts firing. I had to disable them via these Helm chart values:
values:
kubeControllerManager:
enabled: false
kubeScheduler:
enabled: false
kubeProxy:
enabled: false
I tried setting the endpoint values as described in the microk8s docs but it didn't work.
FYI, for EKS the metrics are exposed even though the scheduler and controller manager are in the EKS control plane and not in pods. Ref: https://docs.aws.amazon.com/eks/latest/userguide/view-raw-metrics.html#deploy-prometheus-scraper
I need to dig into this more. When I enabled the EKS metrics, I ended up with duplicate metrics coming from "service" kube-prometheus-stack-kubelet, so I'm guessing that kube-prometheus is exposing the EKS metrics another way.
The metrics coming back from EKS are much richer, with lots of details about each node; I doubt most of those are useful from a metrics perspective.
From EKS:
kubelet_running_pods{beta_kubernetes_io_arch="amd64", beta_kubernetes_io_instance_type="m5.xlarge", beta_kubernetes_io_os="linux", eks_amazonaws_com_capacityType="ON_DEMAND", eks_amazonaws_com_nodegroup="cpu-node-group-plt-20250218143836696000000001", eks_amazonaws_com_nodegroup_image="ami-070ee37f2c1386fd6", eks_amazonaws_com_sourceLaunchTemplateId="lt-070c446cb737f34e7", eks_amazonaws_com_sourceLaunchTemplateVersion="13", failure_domain_beta_kubernetes_io_region="us-east-1", failure_domain_beta_kubernetes_io_zone="us-east-1c", instance="ip-100-96-5-234.ec2.internal", job="kubernetes-nodes", k8s_io_cloud_provider_aws="e1302bd65772c17c5fbf3344a12c2066", kubernetes_io_arch="amd64", kubernetes_io_hostname="ip-100-96-5-234.ec2.internal", kubernetes_io_os="linux", node_kubernetes_io_instance_type="m5.xlarge", topology_ebs_csi_aws_com_zone="us-east-1c", topology_k8s_aws_zone_id="use1-az1", topology_kubernetes_io_region="us-east-1", topology_kubernetes_io_zone="us-east-1c"} | 26
From kube-prometheus-stack-kubelet:
kubelet_running_pods{endpoint="https-metrics", instance="100.96.0.10:10250", job="kubelet", metrics_path="/metrics", namespace="kube-system", node="ip-100-96-0-10.ec2.internal", service="kube-prometheus-stack-kubelet"} | 25
So is there any other solution apart from:
- disable the metrics as you guys mentioned above
- Change bind address to
0.0.0.0
Although there are security concerns for the latter and not recommended. I believe there is a way either with certs supplied similar to etcd approach (or privileged role with sa or something) or through a proxy like haproxy or in my case as I am using rancher, it needs to be done through pushprox
I just deployed the chart on my k3s cluster, and I think the problem is because the Endpoint is actually created on the current set namespace, which is commonly "monitoring", but the namespaceSelector on ServiceMonitor is always pointed to kube-system. The Endpoint and Service supposed to be created on kube-system namespace. But since I use kustomize and set the namespace to monitoring, then all of the namespace value being replaced into monitoring.
@rulim34, that's not why. The pod's IP address is bound to localhost, so other pods can't reach it. To resolve the issue in clusters like Kind where you an exec into the control plane node, edit the schedulers or controller-managers manifest.
For example, for the scheduler, do this.
vi /etc/kubernetes/manifests/kube-scheduler.yaml
spec:
containers:
- command:
- kube-scheduler
- --bind-address=0.0.0.0 # Allow listening on all interfaces
After saving the file, prometheus should be able to scrape the metric.
Normally, you wouldn't do this or else you are maybe doing a test on your control plane, e.g., a performance test using a tool like KWOK.
@rulim34, that's not why. The pod's IP address is bound to localhost, so other pods can't reach it. To resolve the issue in clusters like Kind where you an exec into the control plane node, edit the schedulers or controller-managers manifest.
For example, for the scheduler, do this.
vi /etc/kubernetes/manifests/kube-scheduler.yaml
spec: containers:
- command:
- kube-scheduler
- --bind-address=0.0.0.0 # Allow listening on all interfaces After saving the file, prometheus should be able to scrape the metric.
Normally, you wouldn't do this or else you are maybe doing a test on your control plane, e.g., a performance test using a tool like [KWOK](https://github.com/kubernetes-sigs/kwok).
Thank you, this way helped me with scraping scheduler and controller as well
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
This issue is being automatically closed due to inactivity.
Normally, you wouldn't do this or else you are maybe doing a test on your control plane, e.g., a performance test using a tool like [KWOK](https://github.com/kubernetes-sigs/kwok).