Karpenter integration is missing 'auto_conf.yaml' file
Without the requisite auto_conf.yaml file on the path karpenter/datadog_checks/karpenter/data/auto_conf.yaml (/etc/datadog-agent/conf.d/karpenter.d/ on the agent), the Karpenter integration (which isn't even documented here) isn't able to be ignored and will cause errors, such as:
/var/log/datadog/agent.log:2025-01-31 10:16:51 UTC | CORE | ERROR | (pkg/collector/python/datadog_agent.go:143 in LogMessage) | karpenter:955504ddc7914148 | (base.py:74) | There was an error scraping endpoint http://cluster-karpenter.karpenter.svc:8000/metrics: HTTPConnectionPool(host='cluster-karpenter.karpenter.svc', port=8000): Max retries exceeded with url: /metrics (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x7f1f03c2d2e0>: Failed to resolve 'cluster-karpenter.karpenter.svc' ([Errno -2] Name or service not known)"))
/var/log/datadog/agent.log:2025-01-31 10:16:51 UTC | CORE | ERROR | (pkg/collector/worker/check_logger.go:71 in Error) | check:karpenter | Error running check: [{"message":"There was an error scraping endpoint http://cluster-karpenter.karpenter.svc:8000/metrics: HTTPConnectionPool(host='cluster-karpenter.karpenter.svc', port=8000): Max retries exceeded with url: /metrics (Caused by NameResolutionError(\"<urllib3.connection.HTTPConnection object at 0x7f1f03c2d2e0>: Failed to resolve 'cluster-karpenter.karpenter.svc' ([Errno -2] Name or service not known)\"))","traceback":"Traceback (most recent call last):\n File \"/opt/datadog-agent/embedded/lib/python3.12/site-packages/datadog_checks/base/checks/base.py\", line 1290, in run\n self.check(instance)\n File \"/opt/datadog-agent/embedded/lib/python3.12/site-packages/datadog_checks/base/checks/openmetrics/v2/base.py\", line 75, in check\n raise type(e)(\"There was an error scraping endpoint {}: {}\".format(endpoint, e)) from None\nrequests.exceptions.ConnectionError: There was an error scraping endpoint http://cluster-karpenter.karpenter.svc:8000/metrics: HTTPConnectionPool(host='cluster-karpenter.karpenter.svc', port=8000): Max retries exceeded with url: /metrics (Caused by NameResolutionError(\"<urllib3.connection.HTTPConnection object at 0x7f1f03c2d2e0>: Failed to resolve 'cluster-karpenter.karpenter.svc' ([Errno -2] Name or service not known)\"))\n"}]
I could possibly configure the integration to ensure it points towards the correct place, but we currently have no use for Karpenter's metrics and would rather disable it. Given the logic introduced here, which checks for the presence of an auto_conf.yaml file, any integrations enabled by default should also then have this file in order to make sure they can be ignored using something like DD_IGNORE_AUTOCONF: "redisdb karpenter". This isn't the case at the moment:
$ root@datadog-agent-hwqtr:/# agent version
Agent 7.60.0 - Commit: 799e2984e8 - Serialization version: v5.0.134 - Go version: go1.22.8
$ root@datadog-agent-hwqtr:/# ls -la /etc/datadog-agent/conf.d/karpenter.d/
total 40
drwxr-xr-x 2 root root 31 Jan 31 09:40 .
drwxr-xr-x 218 root root 8192 Jan 31 09:40 ..
-rw-r--r-- 1 root root 24978 Jan 31 09:40 conf.yaml.example
Whereas, for example, redisdb is fine and can be ignored:
$ root@datadog-agent-hwqtr:/# ls -la /etc/datadog-agent/conf.d/redisdb.d/
total 24
drwxr-xr-x 2 root root 53 Jan 31 09:40 .
drwxr-xr-x 218 root root 8192 Jan 31 09:40 ..
-rw-r--r-- 1 root root 662 Jan 31 09:40 auto_conf.yaml
-rw-r--r-- 1 root root 6666 Jan 31 09:40 conf.yaml.example
$ root@datadog-agent-hwqtr:/# grep -ir "redisdb" /var/log/*
/var/log/datadog/agent.log:2025-01-31 09:40:58 UTC | CORE | INFO | (comp/core/autodiscovery/providers/config_reader.go:248 in collectEntry) | Skipping 'auto_conf.yaml' for integration 'redisdb'
There are tons of other integrations as well that do not have this file:
$ find /etc/datadog-agent/conf.d/ -type d -exec test ! -e {}/auto_conf.yaml \; -print
/etc/datadog-agent/conf.d/
/etc/datadog-agent/conf.d/argocd.d
/etc/datadog-agent/conf.d/kyverno.d
/etc/datadog-agent/conf.d/vsphere.d
/etc/datadog-agent/conf.d/avi_vantage.d
/etc/datadog-agent/conf.d/lighttpd.d
/etc/datadog-agent/conf.d/weaviate.d
/etc/datadog-agent/conf.d/aws_neuron.d
/etc/datadog-agent/conf.d/linkerd.d
/etc/datadog-agent/conf.d/weblogic.d
/etc/datadog-agent/conf.d/linux_proc_extras.d
/etc/datadog-agent/conf.d/yarn.d
/etc/datadog-agent/conf.d/azure_iot_edge.d
/etc/datadog-agent/conf.d/boundary.d
/etc/datadog-agent/conf.d/load.d
/etc/datadog-agent/conf.d/zeek.d
/etc/datadog-agent/conf.d/btrfs.d
/etc/datadog-agent/conf.d/mapr.d
/etc/datadog-agent/conf.d/zk.d
/etc/datadog-agent/conf.d/cacti.d
/etc/datadog-agent/conf.d/mapreduce.d
/etc/datadog-agent/conf.d/calico.d
/etc/datadog-agent/conf.d/marathon.d
/etc/datadog-agent/conf.d/cassandra.d
/etc/datadog-agent/conf.d/marklogic.d
/etc/datadog-agent/conf.d/cassandra_nodetool.d
/etc/datadog-agent/conf.d/ceph.d
/etc/datadog-agent/conf.d/memory.d
/etc/datadog-agent/conf.d/mesos_master.d
/etc/datadog-agent/conf.d/checkpoint_quantum_firewall.d
/etc/datadog-agent/conf.d/mesos_slave.d
/etc/datadog-agent/conf.d/mongo.d
/etc/datadog-agent/conf.d/cisco_aci.d
/etc/datadog-agent/conf.d/mysql.d
/etc/datadog-agent/conf.d/nagios.d
/etc/datadog-agent/conf.d/cisco_sdwan.d
/etc/datadog-agent/conf.d/cisco_secure_firewall.d
/etc/datadog-agent/conf.d/network.d
/etc/datadog-agent/conf.d/citrix_hypervisor.d
/etc/datadog-agent/conf.d/network_path.d
/etc/datadog-agent/conf.d/clickhouse.d
/etc/datadog-agent/conf.d/nfsstat.d
/etc/datadog-agent/conf.d/cloud_foundry_api.d
/etc/datadog-agent/conf.d/nginx.d
/etc/datadog-agent/conf.d/cloudera.d
/etc/datadog-agent/conf.d/nginx_ingress_controller.d
/etc/datadog-agent/conf.d/cockroachdb.d
/etc/datadog-agent/conf.d/ntp.d
/etc/datadog-agent/conf.d/confluent_platform.d
/etc/datadog-agent/conf.d/nvidia_triton.d
/etc/datadog-agent/conf.d/oom_kill.d
/etc/datadog-agent/conf.d/container.d
/etc/datadog-agent/conf.d/openldap.d
/etc/datadog-agent/conf.d/container_image.d
/etc/datadog-agent/conf.d/openmetrics.d
/etc/datadog-agent/conf.d/container_lifecycle.d
/etc/datadog-agent/conf.d/openstack.d
/etc/datadog-agent/conf.d/openstack_controller.d
/etc/datadog-agent/conf.d/containerd.d
/etc/datadog-agent/conf.d/oracle-dbm.d
/etc/datadog-agent/conf.d/oracle.d
/etc/datadog-agent/conf.d/orchestrator_ecs.d
/etc/datadog-agent/conf.d/cpu.d
/etc/datadog-agent/conf.d/orchestrator_pod.d
/etc/datadog-agent/conf.d/cri.d
/etc/datadog-agent/conf.d/ossec_security.d
/etc/datadog-agent/conf.d/crio.d
/etc/datadog-agent/conf.d/palo_alto_panorama.d
/etc/datadog-agent/conf.d/pan_firewall.d
/etc/datadog-agent/conf.d/dcgm.d
/etc/datadog-agent/conf.d/pgbouncer.d
/etc/datadog-agent/conf.d/directory.d
/etc/datadog-agent/conf.d/php_fpm.d
/etc/datadog-agent/conf.d/disk.d
/etc/datadog-agent/conf.d/ping_federate.d
/etc/datadog-agent/conf.d/dns_check.d
/etc/datadog-agent/conf.d/postfix.d
/etc/datadog-agent/conf.d/docker.d
/etc/datadog-agent/conf.d/postgres.d
/etc/datadog-agent/conf.d/druid.d
/etc/datadog-agent/conf.d/powerdns_recursor.d
/etc/datadog-agent/conf.d/ecs_fargate.d
/etc/datadog-agent/conf.d/eks_fargate.d
/etc/datadog-agent/conf.d/process.d
/etc/datadog-agent/conf.d/prometheus.d
/etc/datadog-agent/conf.d/envoy.d
/etc/datadog-agent/conf.d/proxysql.d
/etc/datadog-agent/conf.d/esxi.d
/etc/datadog-agent/conf.d/pulsar.d
/etc/datadog-agent/conf.d/ray.d
/etc/datadog-agent/conf.d/file_handle.d
/etc/datadog-agent/conf.d/flink.d
/etc/datadog-agent/conf.d/rethinkdb.d
/etc/datadog-agent/conf.d/fluentd.d
/etc/datadog-agent/conf.d/fluxcd.d
/etc/datadog-agent/conf.d/riakcs.d
/etc/datadog-agent/conf.d/fly_io.d
/etc/datadog-agent/conf.d/sap_hana.d
/etc/datadog-agent/conf.d/foundationdb.d
/etc/datadog-agent/conf.d/sbom.d
/etc/datadog-agent/conf.d/gearmand.d
/etc/datadog-agent/conf.d/scylla.d
/etc/datadog-agent/conf.d/gitlab.d
/etc/datadog-agent/conf.d/service_discovery.d
/etc/datadog-agent/conf.d/gitlab_runner.d
/etc/datadog-agent/conf.d/sidekiq.d
/etc/datadog-agent/conf.d/glusterfs.d
/etc/datadog-agent/conf.d/silk.d
/etc/datadog-agent/conf.d/go_expvar.d
/etc/datadog-agent/conf.d/singlestore.d
/etc/datadog-agent/conf.d/gunicorn.d
/etc/datadog-agent/conf.d/slurm.d
/etc/datadog-agent/conf.d/haproxy.d
/etc/datadog-agent/conf.d/snmp.d/default_profiles
/etc/datadog-agent/conf.d/snmp.d/profiles
/etc/datadog-agent/conf.d/snmp.d/traps_db
/etc/datadog-agent/conf.d/hazelcast.d
/etc/datadog-agent/conf.d/hdfs_datanode.d
/etc/datadog-agent/conf.d/hdfs_namenode.d
/etc/datadog-agent/conf.d/snowflake.d
/etc/datadog-agent/conf.d/hive.d
/etc/datadog-agent/conf.d/solr.d
/etc/datadog-agent/conf.d/hivemq.d
/etc/datadog-agent/conf.d/sonarqube.d
/etc/datadog-agent/conf.d/spark.d
/etc/datadog-agent/conf.d/http_check.d
/etc/datadog-agent/conf.d/hudi.d
/etc/datadog-agent/conf.d/sqlserver.d
/etc/datadog-agent/conf.d/ibm_ace.d
/etc/datadog-agent/conf.d/squid.d
/etc/datadog-agent/conf.d/ibm_db2.d
/etc/datadog-agent/conf.d/ssh_check.d
/etc/datadog-agent/conf.d/ibm_i.d
/etc/datadog-agent/conf.d/statsd.d
/etc/datadog-agent/conf.d/ibm_mq.d
/etc/datadog-agent/conf.d/strimzi.d
/etc/datadog-agent/conf.d/ibm_was.d
/etc/datadog-agent/conf.d/supervisord.d
/etc/datadog-agent/conf.d/ignite.d
/etc/datadog-agent/conf.d/suricata.d
/etc/datadog-agent/conf.d/impala.d
/etc/datadog-agent/conf.d/system_core.d
/etc/datadog-agent/conf.d/io.d
/etc/datadog-agent/conf.d/system_swap.d
/etc/datadog-agent/conf.d/systemd.d
/etc/datadog-agent/conf.d/jboss_wildfly.d
/etc/datadog-agent/conf.d/tcp_check.d
/etc/datadog-agent/conf.d/jetson.d
/etc/datadog-agent/conf.d/tcp_queue_length.d
/etc/datadog-agent/conf.d/jmx.d
/etc/datadog-agent/conf.d/teamcity.d
/etc/datadog-agent/conf.d/journald.d
/etc/datadog-agent/conf.d/tekton.d
/etc/datadog-agent/conf.d/kafka.d
/etc/datadog-agent/conf.d/telemetry.d
/etc/datadog-agent/conf.d/kafka_consumer.d
/etc/datadog-agent/conf.d/teleport.d
/etc/datadog-agent/conf.d/karpenter.d
/etc/datadog-agent/conf.d/temporal.d
/etc/datadog-agent/conf.d/kong.d
/etc/datadog-agent/conf.d/tenable.d
/etc/datadog-agent/conf.d/teradata.d
/etc/datadog-agent/conf.d/tibco_ems.d
/etc/datadog-agent/conf.d/tls.d
/etc/datadog-agent/conf.d/kube_metrics_server.d
/etc/datadog-agent/conf.d/activemq.d
/etc/datadog-agent/conf.d/kube_proxy.d
/etc/datadog-agent/conf.d/torchserve.d
/etc/datadog-agent/conf.d/activemq_xml.d
/etc/datadog-agent/conf.d/traefik_mesh.d
/etc/datadog-agent/conf.d/aerospike.d
/etc/datadog-agent/conf.d/kubeflow.d
/etc/datadog-agent/conf.d/traffic_server.d
/etc/datadog-agent/conf.d/airflow.d
/etc/datadog-agent/conf.d/kubelet.d
/etc/datadog-agent/conf.d/twemproxy.d
/etc/datadog-agent/conf.d/amazon_msk.d
/etc/datadog-agent/conf.d/kubernetes_apiserver.d
/etc/datadog-agent/conf.d/twistlock.d
/etc/datadog-agent/conf.d/ambari.d
/etc/datadog-agent/conf.d/kubernetes_cluster_autoscaler.d
/etc/datadog-agent/conf.d/uptime.d
/etc/datadog-agent/conf.d/varnish.d
/etc/datadog-agent/conf.d/appgate_sdp.d
/etc/datadog-agent/conf.d/kubevirt_api.d
/etc/datadog-agent/conf.d/vault.d
/etc/datadog-agent/conf.d/arangodb.d
/etc/datadog-agent/conf.d/kubevirt_controller.d
/etc/datadog-agent/conf.d/vertica.d
/etc/datadog-agent/conf.d/argo_rollouts.d
/etc/datadog-agent/conf.d/kubevirt_handler.d
/etc/datadog-agent/conf.d/vllm.d
/etc/datadog-agent/conf.d/argo_workflows.d
/etc/datadog-agent/conf.d/voltdb.d
I'm not sure which of these are also enabled by default, but it's worth to know.
Hello @umaasik ! Sorry for not getting to this sooner. Hopefully, you're no longer dealing with this issue. Just incase you are, then I don't believe having a auto_conf.yaml causes this issue.
By default integrations are not on by default(with some exceptions such as system integrations) and having an auto_conf.yaml is to automatically turn them on based on the ad_identifier and the agent looking for them from short images of containers that are running on the nodes.
That being said, if the karpenter check isn't configured than it shouldn't be running if it is running then it's picking up a config from somewhere.
What about the linked code am I misunderstanding where it checks if the file is an auto_conf.yaml file and only then ignores?
@umaasik I think that part is correct. So if the check is shipped with an auto_conf.yaml, then you can ignore it using that param. But if the check doesn't ship with an auto_conf.yaml, then it's not on by default. Check in a fresh the a new 7.66.0 container, karpenter does indeed not ship with an auto_conf.yaml.
If the check is running, then the agent is scraping the config from somewhere else and instantiating an check instance for it. So even if we'd disable the auto_conf.yaml instances, it will still load the config that it's scraping from somewhere else.
The autodiscovery can come from different places:
- auto_conf.yaml if it matches a image specified in the ad_indentifier
- conf.yaml if it matches a image specified in the ad_indentifier
- pod annotations/docker labels
There might be a few more that I can't remember from the top of my head, but if you're still having this issue a simple way to check is the run the agent configcheck on the agent container/pod and see where it's loading the source from:
=== ibm_mq check ===
Configuration provider: file
Configuration source: file:/etc/datadog-agent/conf.d/ibm_mq.d/ibm_mq.yaml
Config for instance ID: ibm_mq:f8adfa69f11cd49c
The above for instance is loading it from the ibm_mq.yaml that exist in my test container.