datadog-operator icon indicating copy to clipboard operation
datadog-operator copied to clipboard

Agent does not work on windows nodes

Open vlinevych opened this issue 1 year ago • 1 comments

Describe what happened: Datadog-operator fails to create an agent daemonset that can run on windows nodes. No errors on the operator side. Pods have issues mounting ServiceAccount token - apparently the feature is not supported on windows nodes.

Describe what you expected: Datadog-operator creates a daemonset that runs dd-agent on windows nodes. Also mentioning if datadog-operator supports windows agents would be useful.

Steps to reproduce the issue: Having a mixed linux/windows cluster, I took a minimal configuration from the examples and added NodeSelector to deploy agents to Windows:

apiVersion: datadoghq.com/v2alpha1
kind: DatadogAgent
metadata:
  name: datadog-agent-win
spec:
  global:
    credentials:
      apiSecret:
        secretName: datadog-operator
        keyName: api-key
      appSecret:
        secretName: datadog-operator
        keyName: app-key
  override:
    nodeAgent:
      nodeSelector:
        kubernetes.io/os: windows
      tolerations:
      - operator: Exists

Operator successfully reconciled, pods were scheduled, but stuck in Init phase. Pod have following event:

  Warning  FailedMount   74s   kubelet  MountVolume.SetUp failed for volume "kube-api-access-fvkfz" : chown c:\var\lib\kubelet\pods\7358dc21-f11a-4bbb-aae9-1d26f43bdf46\volumes\kubernetes.io~projected\kube-api-access-fvkfz\..2023_09_08_16_06_52.1325043848\token: not supported by windows

I was able to bypass this by editing the daemonset's pod spec and adding this to pod template spec:

automountServiceAccountToken: false

but the other error occur:

Warning  Failed             19s (x3 over 33s)  kubelet                  Error: failed to start container "init-volume": Error response from daemon: container init-volume encountered an error during hcsshim::System::CreateProcess: failure in a Windows system call: The system cannot find the file specified. (0x2)

Additional environment details (Operating System, Cloud provider, etc): AWS EKS v1.23.17-eks Windows Server 2019 Datacenter.

vlinevych avatar Sep 08 '23 16:09 vlinevych

Hello! I'm running into this same problem with the latest helm chart to install the operator (1.5.2). Identical error. Anything we need to tweak to get the agents loading on windows machines, @celenechang?

dudo avatar Mar 22 '24 04:03 dudo