integrations-core icon indicating copy to clipboard operation
integrations-core copied to clipboard

Autodiscovery of Kubernetes control-plane components not working on GKE Enterprise (Anthos)

Open sebastien-prudhomme opened this issue 1 year ago • 1 comments

I'm installing Datadog agent on GKE Enterprise (Anthos), where access to control-plane nodes is possible.

All container images used for the control-plane are suffixed with "-amd64" by Google, suffix not present in the "ad_identifiers" of the different integrations:

  • etcd-amd64
  • kube-apiserver-amd64
  • kube-controller-manager-amd64
  • kube-scheduler-amd64

For now my workaround is to overide the default configuration of these integrations in the Helm chart values:

  confd:
    etcd.yaml: |
      ad_identifiers:
        - etcd
        - etcd-amd64
      instances:
        - prometheus_url: http://localhost:2379/metrics
          possible_prometheus_urls:
            - https://%%host%%:2379/metrics
            - http://%%host%%:2379/metrics
          ssl_verify: false
    kube_apiserver_metrics.yaml: |
      ad_identifiers:
        - kube-apiserver
        - kube-apiserver-amd64
      instances:
        - possible_prometheus_urls:
            - https://%%host%%:6443/metrics
            - https://%%host%%:8443/metrics
          bearer_token_auth: tls_only
          tags:
            - apiserver:%%host%%
    kube_controller_manager.yaml: |
      ad_identifiers:
        - kube-controller-manager
        - kube-controller-manager-amd64
      instances:
        - possible_prometheus_urls:
            - https://%%host%%:10257/metrics
            - https://localhost:10257/metrics
            - http://%%host%%:10252/metrics
            - http://localhost:10252/metrics
          bearer_token_auth: tls_only
          ssl_verify: false
    kube_scheduler.yaml: |
      ad_identifiers:
        - kube-scheduler
        - kube-scheduler-amd64
      instances:
        - possible_prometheus_urls:
            - https://%%host%%:10259/metrics
            - https://localhost:10259/metrics
            - http://%%host%%:10251/metrics
            - http://localhost:10251/metrics
          bearer_token_auth: tls_only
          ssl_verify: false

sebastien-prudhomme avatar Oct 10 '24 20:10 sebastien-prudhomme

There is also a problem in the corresponding overview dashboards because some widgest are using the "short_image" field in the query, for instance "query": "sum:kubernetes.memory.usage{$cluster,$scope,short_image:kube-scheduler} by {pod_name}"

sebastien-prudhomme avatar Oct 14 '24 15:10 sebastien-prudhomme

The default config files we provide along with the Out Of the box overview dashboard, are meant to be updated and adjusted to your environment. So adding image names to the ad_identifiers files and cloning and changing the queries in the dashboard is expected.

HadhemiDD avatar Oct 25 '24 09:10 HadhemiDD