Support GKE Autopilot
Currently (v0.8.1) GKE Autopilot causes the following errors from the GKE Warden's admission webhook:
datadog-operator {"level":"ERROR","ts":"2022-12-13T15:31:07Z","logger":"controller-runtime.manager.controller.datadogagent","msg":"Reconciler error","reconciler group":"datadoghq.com","reconciler kind":"DatadogAgent","name":"datadog","namespace":"datadog","error":"admission webhook \"gkepolicy.common-webhooks.networking.gke.io\" denied the request: GKE Warden rejected the request because it violates one or more constraints.\nViolations details: {\"[denied by autogke-no-write-mode-hostpath]\":[\"hostPath volume procdir used in container agent uses path /proc which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].\",\"hostPath volume cgroups used in container agent uses path /sys/fs/cgroup which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].\",\"hostPath volume runtimesocketdir used in container agent uses path /var/run which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].\",\"hostPath volume cgroups used in container process-agent uses path /sys/fs/cgroup which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].\",\"hostPath volume procdir used in container process-agent uses path /proc which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].\",\"hostPath volume runtimesocketdir used in container process-agent uses path /var/run which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].\",\"hostPath volume procdir used in container init-config uses path /proc which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].\",\"hostPath volume cgroups used in container init-config uses path /sys/fs/cgroup which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].\",\"hostPath volume runtimesocketdir used in container init-config uses path /var/run which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].\"]}\nRequested by user: 'system:serviceaccount:datadog:datadog-operator', groups: 'system:serviceaccounts,system:serviceaccounts:datadog,system:authenticated'."}
It should either be clearly documented somewhere that GKE Autopilot reduces the features supported (and specifically which features), or preferably a workaround should be developed.
Here's a cleaned up list of the denied volume mounts:
admission webhook "gkepolicy.common-webhooks.networking.gke.io" denied the request: GKE Warden rejected the request because it violates one or more constraints.
Violations details: [denied by autogke-no-write-mode-hostpath]:
hostPath volume procdir used in container agent uses path /proc which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].
hostPath volume cgroups used in container agent uses path /sys/fs/cgroup which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].
hostPath volume runtimesocketdir used in container agent uses path /var/run which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].
hostPath volume cgroups used in container process-agent uses path /sys/fs/cgroup which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].
hostPath volume procdir used in container process-agent uses path /proc which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].
hostPath volume runtimesocketdir used in container process-agent uses path /var/run which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].
hostPath volume procdir used in container init-config uses path /proc which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].
hostPath volume cgroups used in container init-config uses path /sys/fs/cgroup which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].
hostPath volume runtimesocketdir used in container init-config uses path /var/run which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].
Requested by user: 'system:serviceaccount:datadog:datadog-operator', groups: 'system:serviceaccounts,system:serviceaccounts:datadog,system:authenticated'.
Thanks for opening this one up separately! I will add it to the card we have in our backlog. We have this feature very high in the list, and I agree we should document this better. We will keep you posted when we add support to the operator, it will likely only be in the 1.x version (that we are currently releasing).
Any news on this one? we are encountering the same problem and can't set up our cluster properly
On trying create a ClusterAgent we have the similar behavior, the Operator don't create the serviceaccount.
error looking up service account datadog/datadog-cluster-agent: serviceaccount "datadog-cluster-agent" not found
Same problem. When can we expect this to be resolved?
Having the same issue on autopilot, sad to see this is not working after a year.
Helm chart has supported autopilot for years now, can we please get that functionality ported here :)
Observations on this issue:
- service account
datadog-cluster-agentis not created - None of the rbac settings are created
Is this a helm template issue? permissions issue?
The documentation for gke autopilot does not cover the Operator method, only Helm (no Tab for it), does that mean Operator does not work on autopilot?
https://docs.datadoghq.com/containers/kubernetes/distributions/?tab=operator&site=us5#autopilot
i can see this from below blog, If you’re using GKE Autopilot, the Helm chart is the best way to install Datadog, as the Operator is not currently supported.
https://www.datadoghq.com/blog/monitor-google-kubernetes-engine/#deploying-the-datadog-agent-to-your-gke-cluster
Which means datadog operator is not yet ready to support for GKE autopilot cluster, we probably need to create a feature request for this....