datadog-agent
datadog-agent copied to clipboard
[CONTP-283] Should require also API Group (in addition to resource name) for generic metadata collection
What does this PR do?
This PR includes the resource api group in the configuration parameter for generic metadata collection.
In other words, instead of having DD_CLUSTER_AGENT_KUBE_METADATA_COLLECTION_RESOURCES = [deployments statefulsets nodes]
, we will now have DD_CLUSTER_AGENT_KUBE_METADATA_COLLECTION_RESOURCES = [apps/deployments apps/statefulsets /nodes]
Motivation
Avoid collisions in cases were we have the same resource name under different api groups. An example of this is GKE:
On GKE, we have the nodes
resource under two different API Groups:
-
metrics.k8s.io
-
""
(empty api group, corresponding to the default empty group in kubernetes)
In this case, if the user asks to collect metadata of nodes
, it will not be possible to know if we need to collect metadata of
-
nodes.metrics.k8s.io
-
nodes
This results in a conflict.
Additional Notes
- With this change, the user can also indicate the group version if they wish to by using the format
{group}/{version}/{resource}
. For exampleapps/v1/deployments
. When using this format, the discovery client will not be used to fill the version, and the indicated version will be used as it is.
Possible Drawbacks / Trade-offs
Describe how to test/QA your changes
❗ For better validation, do this QA on GKE because the issue was initially discovered on GKE due to having same resource name under different api groups (see #motivation section for more information) ❗
Deploy the cluster agent with the following helm file:
datadog:
apiKeyExistingSecret: datadog-secret
appKeyExistingSecret: datadog-secret
kubelet:
tlsVerify: false
clusterAgent:
enabled: true
replicas: 1
env:
- name: DD_CLUSTER_AGENT_KUBE_METADATA_COLLECTION_ENABLED
value: "true"
- name: DD_CLUSTER_AGENT_KUBE_METADATA_COLLECTION_RESOURCES
value: "apps/deployments apps/daemonsets /nodes"
Ensure that metadata is collected successfully for deployments, daemonsets, and nodes.
kubectl exec <cluster-agent-pod> -- agent workload-list -v
=== Entity kubernetes_metadata sources(merged):[kubeapiserver] id: deployments/kube-system/kube-dns-autoscaler ===
----------- Entity ID -----------
Kind: kubernetes_metadata ID: deployments/kube-system/kube-dns-autoscaler
----------- Entity Meta -----------
Name: kube-dns-autoscaler
Namespace: kube-system
Annotations: deployment.kubernetes.io/revision:1
Labels: addonmanager.kubernetes.io/mode:Reconcile k8s-app:kube-dns-autoscaler kubernetes.io/cluster-service:true
----------- Resource -----------
apps/v1, Resource=deployments
===
=== Entity kubernetes_metadata sources(merged):[kubeapiserver] id: nodes//gke-adelhajhassan-default-pool-14a7bd1d-jnf2 ===
----------- Entity ID -----------
Kind: kubernetes_metadata ID: nodes//gke-adelhajhassan-default-pool-14a7bd1d-jnf2
----------- Entity Meta -----------
Name: gke-adelhajhassan-default-pool-14a7bd1d-jnf2
Namespace:
Annotations: node.gke.io/last-applied-node-taints: volumes.kubernetes.io/controller-managed-attach-detach:true container.googleapis.com/instance_id:3216393220270216000 csi.volume.kubernetes.io/nodeid:{"pd.csi.storage.gke.io":"projects/datadog-sandbox/zones/us-central1-c/instances/gke-adelhajhassan-default-pool-14a7bd1d-jnf2"} node.alpha.kubernetes.io/ttl:0 node.gke.io/last-applied-node-labels:cloud.google.com/gke-boot-disk=pd-balanced,cloud.google.com/gke-container-runtime=containerd,cloud.google.com/gke-cpu-scaling-level=2,cloud.google.com/gke-logging-variant=DEFAULT,cloud.google.com/gke-max-pods-per-node=110,cloud.google.com/gke-nodepool=default-pool,cloud.google.com/gke-os-distribution=cos,cloud.google.com/gke-provisioning=standard,cloud.google.com/gke-stack-type=IPV4,cloud.google.com/machine-family=e2,cloud.google.com/private-node=false
Labels: beta.kubernetes.io/arch:amd64 cloud.google.com/gke-boot-disk:pd-balanced cloud.google.com/gke-cpu-scaling-level:2 kubernetes.io/arch:amd64 topology.gke.io/zone:us-central1-c cloud.google.com/gke-max-pods-per-node:110 cloud.google.com/gke-nodepool:default-pool cloud.google.com/gke-provisioning:standard failure-domain.beta.kubernetes.io/region:us-central1 topology.kubernetes.io/zone:us-central1-c cloud.google.com/gke-container-runtime:containerd cloud.google.com/gke-logging-variant:DEFAULT cloud.google.com/gke-os-distribution:cos failure-domain.beta.kubernetes.io/zone:us-central1-c kubernetes.io/os:linux topology.kubernetes.io/region:us-central1 node.kubernetes.io/instance-type:e2-medium beta.kubernetes.io/instance-type:e2-medium beta.kubernetes.io/os:linux cloud.google.com/gke-stack-type:IPV4 cloud.google.com/machine-family:e2 cloud.google.com/private-node:false kubernetes.io/hostname:gke-adelhajhassan-default-pool-14a7bd1d-jnf2
----------- Resource -----------
/v1, Resource=nodes
===
=== Entity kubernetes_metadata sources(merged):[kubeapiserver] id: daemonsets/gmp-system/collector ===
----------- Entity ID -----------
Kind: kubernetes_metadata ID: daemonsets/gmp-system/collector
----------- Entity Meta -----------
Name: collector
Namespace: gmp-system
Annotations: components.gke.io/layer:addon
Labels: addonmanager.kubernetes.io/mode:Reconcile
----------- Resource -----------
apps/v1, Resource=daemonsets
===