k8s-cluster-api-provider
k8s-cluster-api-provider copied to clipboard
Make metrics endpoints on control plane and worker nodes available for monitoring
The metrics endpoints of kube-controller-manager
, kube-proxy
and kube-scheduler
cannot be reached from prometheus (running on the worker nodes within the same cluster) as they only bind to localhost by default.
The easiest way might be to change the bind-addresses similar to what teutonet did:
- https://github.com/teutonet/teutonet-helm-charts/blob/f195a8434b8fbad15d0d57b926e8a3ea571d13b1/charts/t8s-cluster/files/kube-proxy.config.yaml#L4
- https://github.com/teutonet/teutonet-helm-charts/blob/f195a8434b8fbad15d0d57b926e8a3ea571d13b1/charts/t8s-cluster/templates/management-cluster/clusterClass/kubeadmnControlPlaneTemplate/_kubeadmControlPlaneTemplateSpec.yaml#L39
- https://github.com/teutonet/teutonet-helm-charts/blob/f195a8434b8fbad15d0d57b926e8a3ea571d13b1/charts/t8s-cluster/templates/management-cluster/clusterClass/kubeadmnControlPlaneTemplate/_kubeadmControlPlaneTemplateSpec.yaml#L28
Additionally some security groups rules need to be added to make the metrics endpoints on all worker and control plane nodes reachable for prometheus (running on the worker nodes).
Control plane security group, source worker security group, destination ports:
-
10249
(kube-proxy) -
10250
(cadvisor) -
10257
(kube-controller-manager), -
10259
(kube-scheduler) -
2379
(etcd https see here), -
6443
(apiserver) -
9100
(node-exporter)
Worker security group, source worker security group, destination ports:
-
10249
(kube-proxy) -
10250
(cadvisor) -
9100
(node-exporter)
Apparently the node exporter (port 9100
) and kube-proxy (port 10249
) endpoints are using plain http by default (using kube-prometheus-stack with prometheus-operator).
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment, or this will be closed in 60 days.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment, or this will be closed in 60 days.
This issue was closed because it has been stalled for 60 days with no activity.