datadog-operator More Talos support

What does this PR do?

Adds more changes to support Talos linux.

Motivation

I saw that some work had already been done in https://github.com/DataDog/datadog-operator/pull/1765. I tried it out, but couldn't get it working without these changes.

Additional Notes

This PR is just to merge changes into the existing PR for this work.

Minimum Agent Versions

Are there minimum versions of the Datadog Agent and/or Cluster Agent required?

Agent: vX.Y.Z
Cluster Agent: vX.Y.Z

Describe your test plan

Checked out the existing branch, built an image and pushed to GitHub container registry for testing.

VERSION=v1.14.0 IMG=ghcr.io/jonstacks/datadog-operator:v1.14.0-talos-patch make docker-build
docker image push ghcr.io/jonstacks/datadog-operator:v1.14.0-talos-patch

Deployed with Argo into my Talos test cluster:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: datadog-operator
  namespace: argocd
  finalizers:
  - resources-finalizer.argocd.argoproj.io
spec:
  destination:
    namespace: datadog-operator
    server: https://kubernetes.default.svc
  source:
    chart: datadog-operator
    repoURL: https://helm.datadoghq.com
    targetRevision: 2.9.2
    helm:
      releaseName: datadog-operator
      valuesObject:
        introspection:
          enabled: true
        image:
          repository: ghcr.io/jonstacks/datadog-operator
          tag: v1.14.0-talos-patch
          pullPolicy: Always
          doNotCheckTag: true  # needed so that the introspection flag gets passed
        imagePullSecrets:
        - name: ghcr-io
        logLevel: "debug"
  project: default
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
    - CreateNamespace=true
    - Validate=false
    retry:
      limit: 5
      backoff:
        duration: 5s
        maxDuration: 3m0s
        factor: 2

Deployed the following DatadogAgent

apiVersion: "datadoghq.com/v2alpha1"
kind: "DatadogAgent"
metadata:
  name: "datadog"
  namespace: "datadog-operator"
spec:
  global:
    clusterName: "home-cluster"
    site: "us5.datadoghq.com"

    credentials:
      apiSecret:
        secretName: "datadog-secret"
        keyName: "api-key"

    kubelet:
      tlsVerify: false
    
    tags:
    - "env:dev"

  features:
    clusterChecks:
      enabled: true
      useClusterChecksRunners: true
    kubeStateMetricsCore:
      enabled: true
    logCollection:
      enabled: true
      containerCollectAll: false
    orchestratorExplorer:
      enabled: true

Checklist

[ ] PR has at least one valid label: bug, enhancement, refactoring, documentation, tooling, and/or dependencies
[ ] PR has a milestone or the qa/skip-qa label

May 18 '25 07:05 jonstacks

Does this also avoid mounting /etc/passwd like the other issue mentioned? I tried with the provider value for the helm chart and still got the error

failed to mkdir "/etc/passwd": mkdir /etc/passwd: read-only file system

Jun 03 '25 23:06 rothgar

@rothgar, sorry for the delay. Yes, it does avoid mounting /etc/passwd. Currently you have to build a custom image from this branch and push it to a registry and pass that as a value to the helm chart like I'm doing with Argo up above.

Here are the generated volumeMounts on the datadog-agent-talos statefulset:

        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/log/datadog
          name: logdatadog
        - mountPath: /checks.d
          name: checksd
          readOnly: true
        - mountPath: /etc/datadog-agent/auth
          name: datadog-agent-auth
        - mountPath: /conf.d
          name: confd
          readOnly: true
        - mountPath: /etc/datadog-agent
          name: config
        - mountPath: /host/proc
          name: procdir
          readOnly: true
        - mountPath: /host/var/run
          name: runtimesocketdir
          readOnly: true
      restartPolicy: Always

I run Talos in my homelab, but my datadog free trial ran out :laughing: and that was enough friction to stop dev on it currently. I'll see if they'll grant me an extension.

Aug 19 '25 06:08 jonstacks

This pull request has been automatically marked as stale because it has not had activity in the past 15 days.

It will be closed in 30 days if no further activity occurs. If this pull request is still relevant, adding a comment or pushing new commits will keep it open. Also, you can always reopen the pull request if you missed the window.

Thank you for your contributions!

Oct 12 '25 10:10 dd-octo-sts[bot]