datadog-operator
datadog-operator copied to clipboard
Registry setting is not inherited by admission controller for library injection
Output of the info page (if this is a bug)
(Paste the output of the info page here)
Describe what happened: I have a Datadog agent configured in my cluster with a registry flag set to ECR, configuration below. I am also using library injection to instrument my pods by following this page: https://docs.datadoghq.com/tracing/trace_collection/library_injection_local/?tab=kubernetes
The library injection's init container is using GCR instead of ECR like I would expect with how I am configured.
apiVersion: datadoghq.com/v2alpha1
kind: DatadogAgent
metadata:
creationTimestamp: "2023-07-31T21:32:47Z"
finalizers:
- finalizer.agent.datadoghq.com
name: datadog
namespace: datadog
spec:
features:
admissionController:
enabled: true
apm:
enabled: true
hostPortConfig:
enabled: true
cspm:
enabled: false
cws:
enabled: false
dogstatsd:
hostPortConfig:
enabled: true
externalMetricsServer:
enabled: true
liveProcessCollection:
enabled: true
logCollection:
containerCollectAll: true
enabled: true
prometheusScrape:
enableServiceEndpoints: false
enabled: false
global:
clusterName: my-cluster
credentials:
apiSecret:
keyName: api-key
secretName: datadog-operator-apikey
appSecret:
keyName: app-key
secretName: datadog-operator-appkey
registry: public.ecr.aws/datadog
site: datadoghq.com
override:
nodeAgent:
tolerations:
- operator: Exists
Describe what you expected: I would expect that by setting the Datadog agent to use ECR as a registry it would use ECR for everything. The cluster agent uses it, the node agents use it, but the injection initContainer still uses GCR.
An injected pod output (created after the DD Agent rollout):
initContainers:
- command:
- sh
- copy-lib.sh
- /datadog-lib
image: gcr.io/datadoghq/dd-lib-js-init:latest
imagePullPolicy: Always
name: datadog-lib-js-init
Steps to reproduce the issue:
Deploy the DD agent onto an EKS cluster with the above config, instrument a deployment to auto-instrument. My example is a Nextjs app with the following labels and annotations:
apiVersion: v1
kind: Pod
metadata:
annotations:
admission.datadoghq.com/js-lib.version: latest
cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
creationTimestamp: "2023-09-15T14:52:04Z"
labels:
admission.datadoghq.com/enabled: "true"
tags.datadoghq.com/env: dev
tags.datadoghq.com/service: test
tags.datadoghq.com/version: 1.3.4
Additional environment details (Operating System, Cloud provider, etc):
This is running on EKS in AWS.
Thanks for opening the issue @code-eg . If you need an immediate workaround you could add the env var DD_ADMISSION_CONTROLLER_AUTO_INSTRUMENTATION_CONTAINER_REGISTRY
to the Cluster Agent override. We will add a task for this request on our end.
I noticed this as well, thanks for the workaround @celenechang. In case anyone else stumbles on this, here's the workaround.
apiVersion: datadoghq.com/v2alpha1
kind: DatadogAgent
metadata:
name: datadog
spec:
override:
clusterAgent:
containers:
cluster-agent:
env:
- name: DD_ADMISSION_CONTROLLER_AUTO_INSTRUMENTATION_CONTAINER_REGISTRY
value: "public.ecr.aws/datadog"
This config was added to admission controller feature in Operator 1.7.0 under features.admissionController.registry
.
Does the feature respect the global.registry
too?