k8s-cluster-api-provider icon indicating copy to clipboard operation
k8s-cluster-api-provider copied to clipboard

Make metrics endpoints on control plane and worker nodes available for monitoring

Open Nils98Ar opened this issue 1 year ago • 2 comments

The metrics endpoints of kube-controller-manager, kube-proxy and kube-scheduler cannot be reached from prometheus (running on the worker nodes within the same cluster) as they only bind to localhost by default.

The easiest way might be to change the bind-addresses similar to what teutonet did:

  • https://github.com/teutonet/teutonet-helm-charts/blob/f195a8434b8fbad15d0d57b926e8a3ea571d13b1/charts/t8s-cluster/files/kube-proxy.config.yaml#L4
  • https://github.com/teutonet/teutonet-helm-charts/blob/f195a8434b8fbad15d0d57b926e8a3ea571d13b1/charts/t8s-cluster/templates/management-cluster/clusterClass/kubeadmnControlPlaneTemplate/_kubeadmControlPlaneTemplateSpec.yaml#L39
  • https://github.com/teutonet/teutonet-helm-charts/blob/f195a8434b8fbad15d0d57b926e8a3ea571d13b1/charts/t8s-cluster/templates/management-cluster/clusterClass/kubeadmnControlPlaneTemplate/_kubeadmControlPlaneTemplateSpec.yaml#L28

Additionally some security groups rules need to be added to make the metrics endpoints on all worker and control plane nodes reachable for prometheus (running on the worker nodes).

Control plane security group, source worker security group, destination ports:

  • 10249 (kube-proxy)
  • 10250 (cadvisor)
  • 10257 (kube-controller-manager),
  • 10259 (kube-scheduler)
  • 2379 (etcd https see here),
  • 6443 (apiserver)
  • 9100 (node-exporter)

Worker security group, source worker security group, destination ports:

  • 10249 (kube-proxy)
  • 10250 (cadvisor)
  • 9100 (node-exporter)

Apparently the node exporter (port 9100) and kube-proxy (port 10249) endpoints are using plain http by default (using kube-prometheus-stack with prometheus-operator).

Nils98Ar avatar Nov 21 '23 16:11 Nils98Ar

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment, or this will be closed in 60 days.

github-actions[bot] avatar Jan 13 '24 02:01 github-actions[bot]

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment, or this will be closed in 60 days.

github-actions[bot] avatar Apr 12 '24 02:04 github-actions[bot]

This issue was closed because it has been stalled for 60 days with no activity.

github-actions[bot] avatar Jun 11 '24 02:06 github-actions[bot]