datadog-operator icon indicating copy to clipboard operation
datadog-operator copied to clipboard

Registry setting is not inherited by admission controller for library injection

Open code-eg opened this issue 1 year ago • 2 comments

Output of the info page (if this is a bug)

(Paste the output of the info page here)

Describe what happened: I have a Datadog agent configured in my cluster with a registry flag set to ECR, configuration below. I am also using library injection to instrument my pods by following this page: https://docs.datadoghq.com/tracing/trace_collection/library_injection_local/?tab=kubernetes

The library injection's init container is using GCR instead of ECR like I would expect with how I am configured.

apiVersion: datadoghq.com/v2alpha1
kind: DatadogAgent
metadata:
  creationTimestamp: "2023-07-31T21:32:47Z"
  finalizers:
  - finalizer.agent.datadoghq.com
  name: datadog
  namespace: datadog
spec:
  features:
    admissionController:
      enabled: true
    apm:
      enabled: true
      hostPortConfig:
        enabled: true
    cspm:
      enabled: false
    cws:
      enabled: false
    dogstatsd:
      hostPortConfig:
        enabled: true
    externalMetricsServer:
      enabled: true
    liveProcessCollection:
      enabled: true
    logCollection:
      containerCollectAll: true
      enabled: true
    prometheusScrape:
      enableServiceEndpoints: false
      enabled: false
  global:
    clusterName: my-cluster
    credentials:
      apiSecret:
        keyName: api-key
        secretName: datadog-operator-apikey
      appSecret:
        keyName: app-key
        secretName: datadog-operator-appkey
    registry: public.ecr.aws/datadog
    site: datadoghq.com
  override:
    nodeAgent:
      tolerations:
      - operator: Exists

Describe what you expected: I would expect that by setting the Datadog agent to use ECR as a registry it would use ECR for everything. The cluster agent uses it, the node agents use it, but the injection initContainer still uses GCR.

An injected pod output (created after the DD Agent rollout):

initContainers:
  - command:
    - sh
    - copy-lib.sh
    - /datadog-lib
    image: gcr.io/datadoghq/dd-lib-js-init:latest
    imagePullPolicy: Always
    name: datadog-lib-js-init

Steps to reproduce the issue:

Deploy the DD agent onto an EKS cluster with the above config, instrument a deployment to auto-instrument. My example is a Nextjs app with the following labels and annotations:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    admission.datadoghq.com/js-lib.version: latest
    cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
  creationTimestamp: "2023-09-15T14:52:04Z"
  labels:
    admission.datadoghq.com/enabled: "true"
    tags.datadoghq.com/env: dev
    tags.datadoghq.com/service: test
    tags.datadoghq.com/version: 1.3.4

Additional environment details (Operating System, Cloud provider, etc):

This is running on EKS in AWS.

code-eg avatar Sep 15 '23 15:09 code-eg

Thanks for opening the issue @code-eg . If you need an immediate workaround you could add the env var DD_ADMISSION_CONTROLLER_AUTO_INSTRUMENTATION_CONTAINER_REGISTRY to the Cluster Agent override. We will add a task for this request on our end.

celenechang avatar Oct 03 '23 12:10 celenechang

I noticed this as well, thanks for the workaround @celenechang. In case anyone else stumbles on this, here's the workaround.

apiVersion: datadoghq.com/v2alpha1
kind: DatadogAgent
metadata:
  name: datadog
spec:
  override:
    clusterAgent:
      containers:
        cluster-agent:
          env:
            - name: DD_ADMISSION_CONTROLLER_AUTO_INSTRUMENTATION_CONTAINER_REGISTRY
              value: "public.ecr.aws/datadog"

darren-recentive avatar Nov 05 '23 12:11 darren-recentive

This config was added to admission controller feature in Operator 1.7.0 under features.admissionController.registry.

levan-m avatar Jul 25 '24 12:07 levan-m

Does the feature respect the global.registry too?

code-eg avatar Jul 25 '24 15:07 code-eg