Autodiscovery of Kubernetes control-plane components not working on GKE Enterprise (Anthos)
I'm installing Datadog agent on GKE Enterprise (Anthos), where access to control-plane nodes is possible.
All container images used for the control-plane are suffixed with "-amd64" by Google, suffix not present in the "ad_identifiers" of the different integrations:
- etcd-amd64
- kube-apiserver-amd64
- kube-controller-manager-amd64
- kube-scheduler-amd64
For now my workaround is to overide the default configuration of these integrations in the Helm chart values:
confd:
etcd.yaml: |
ad_identifiers:
- etcd
- etcd-amd64
instances:
- prometheus_url: http://localhost:2379/metrics
possible_prometheus_urls:
- https://%%host%%:2379/metrics
- http://%%host%%:2379/metrics
ssl_verify: false
kube_apiserver_metrics.yaml: |
ad_identifiers:
- kube-apiserver
- kube-apiserver-amd64
instances:
- possible_prometheus_urls:
- https://%%host%%:6443/metrics
- https://%%host%%:8443/metrics
bearer_token_auth: tls_only
tags:
- apiserver:%%host%%
kube_controller_manager.yaml: |
ad_identifiers:
- kube-controller-manager
- kube-controller-manager-amd64
instances:
- possible_prometheus_urls:
- https://%%host%%:10257/metrics
- https://localhost:10257/metrics
- http://%%host%%:10252/metrics
- http://localhost:10252/metrics
bearer_token_auth: tls_only
ssl_verify: false
kube_scheduler.yaml: |
ad_identifiers:
- kube-scheduler
- kube-scheduler-amd64
instances:
- possible_prometheus_urls:
- https://%%host%%:10259/metrics
- https://localhost:10259/metrics
- http://%%host%%:10251/metrics
- http://localhost:10251/metrics
bearer_token_auth: tls_only
ssl_verify: false
There is also a problem in the corresponding overview dashboards because some widgest are using the "short_image" field in the query, for instance "query": "sum:kubernetes.memory.usage{$cluster,$scope,short_image:kube-scheduler} by {pod_name}"
The default config files we provide along with the Out Of the box overview dashboard, are meant to be updated and adjusted to your environment. So adding image names to the ad_identifiers files and cloning and changing the queries in the dashboard is expected.