datadog-operator icon indicating copy to clipboard operation
datadog-operator copied to clipboard

Support GKE Autopilot

Open leosunmo opened this issue 3 years ago • 14 comments

Currently (v0.8.1) GKE Autopilot causes the following errors from the GKE Warden's admission webhook:

datadog-operator {"level":"ERROR","ts":"2022-12-13T15:31:07Z","logger":"controller-runtime.manager.controller.datadogagent","msg":"Reconciler error","reconciler group":"datadoghq.com","reconciler kind":"DatadogAgent","name":"datadog","namespace":"datadog","error":"admission webhook \"gkepolicy.common-webhooks.networking.gke.io\" denied the request: GKE Warden rejected the request because it violates one or more constraints.\nViolations details: {\"[denied by autogke-no-write-mode-hostpath]\":[\"hostPath volume procdir used in container agent uses path /proc which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].\",\"hostPath volume cgroups used in container agent uses path /sys/fs/cgroup which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].\",\"hostPath volume runtimesocketdir used in container agent uses path /var/run which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].\",\"hostPath volume cgroups used in container process-agent uses path /sys/fs/cgroup which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].\",\"hostPath volume procdir used in container process-agent uses path /proc which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].\",\"hostPath volume runtimesocketdir used in container process-agent uses path /var/run which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].\",\"hostPath volume procdir used in container init-config uses path /proc which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].\",\"hostPath volume cgroups used in container init-config uses path /sys/fs/cgroup which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].\",\"hostPath volume runtimesocketdir used in container init-config uses path /var/run which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].\"]}\nRequested by user: 'system:serviceaccount:datadog:datadog-operator', groups: 'system:serviceaccounts,system:serviceaccounts:datadog,system:authenticated'."}

It should either be clearly documented somewhere that GKE Autopilot reduces the features supported (and specifically which features), or preferably a workaround should be developed.

Here's a cleaned up list of the denied volume mounts:

admission webhook "gkepolicy.common-webhooks.networking.gke.io" denied the request: GKE Warden rejected the request because it violates one or more constraints.
Violations details: [denied by autogke-no-write-mode-hostpath]:

hostPath volume procdir used in container agent uses path /proc which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].

hostPath volume cgroups used in container agent uses path /sys/fs/cgroup which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].

hostPath volume runtimesocketdir used in container agent uses path /var/run which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].

hostPath volume cgroups used in container process-agent uses path /sys/fs/cgroup which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].

hostPath volume procdir used in container process-agent uses path /proc which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].

hostPath volume runtimesocketdir used in container process-agent uses path /var/run which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].

hostPath volume procdir used in container init-config uses path /proc which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].

hostPath volume cgroups used in container init-config uses path /sys/fs/cgroup which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].

hostPath volume runtimesocketdir used in container init-config uses path /var/run which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].

Requested by user: 'system:serviceaccount:datadog:datadog-operator', groups: 'system:serviceaccounts,system:serviceaccounts:datadog,system:authenticated'.

leosunmo avatar Dec 13 '22 15:12 leosunmo

Thanks for opening this one up separately! I will add it to the card we have in our backlog. We have this feature very high in the list, and I agree we should document this better. We will keep you posted when we add support to the operator, it will likely only be in the 1.x version (that we are currently releasing).

CharlyF avatar Dec 13 '22 15:12 CharlyF

Any news on this one? we are encountering the same problem and can't set up our cluster properly

tanqhnguyen avatar Jan 12 '23 08:01 tanqhnguyen

On trying create a ClusterAgent we have the similar behavior, the Operator don't create the serviceaccount. error looking up service account datadog/datadog-cluster-agent: serviceaccount "datadog-cluster-agent" not found

adaosantos avatar Sep 06 '23 14:09 adaosantos

Same problem. When can we expect this to be resolved?

lyona avatar Nov 21 '23 15:11 lyona

Having the same issue on autopilot, sad to see this is not working after a year.

mrkmcknz avatar Dec 03 '23 10:12 mrkmcknz

Helm chart has supported autopilot for years now, can we please get that functionality ported here :)

tkoft avatar Mar 07 '24 00:03 tkoft

Observations on this issue:

  • service account datadog-cluster-agent is not created
  • None of the rbac settings are created

Is this a helm template issue? permissions issue?

The documentation for gke autopilot does not cover the Operator method, only Helm (no Tab for it), does that mean Operator does not work on autopilot?
Screenshot 2024-04-12 at 5 58 30 PM https://docs.datadoghq.com/containers/kubernetes/distributions/?tab=operator&site=us5#autopilot

rojomisin avatar Apr 13 '24 00:04 rojomisin

i can see this from below blog, If you’re using GKE Autopilot, the Helm chart is the best way to install Datadog, as the Operator is not currently supported. https://www.datadoghq.com/blog/monitor-google-kubernetes-engine/#deploying-the-datadog-agent-to-your-gke-cluster Which means datadog operator is not yet ready to support for GKE autopilot cluster, we probably need to create a feature request for this....

erabusi avatar Jul 19 '24 04:07 erabusi