GKE Autopilot allowlist onboarding
Grafana Beyla, which has been donated to OTel as https://opentelemetry.io/docs/zero-code/obi/, is allowlisted by GKE autopilot to be able to use elevated permissions on autopilot, which are required for it to work.
Maintainers of OBI (@open-telemetry/ebpf-instrumentation-maintainers ) reached out to me, and are interested in allowing OBI to work in a similar way.
Autopilot has a partner program, which allows partners to allowlist their workloads, and manage the permissions needed for them to work on autopilot. Onboarding requires a GCP project (for testing changes to the allowlist), and a google group to manage access.
There are a few questions before we initiate the process:
- Can @open-telemetry/ebpf-instrumentation-maintainers confirm that they are interested in owning this as a SIG?
- Scope: IMO we should onboard as "OpenTelemetry", rather than "OpenTelemetry-eBPF-Instrumentation" in-case other workloads would like to integrate in the future.
- Are there other SIGs that should be involved, or be shared owners?
Disclaimer: I work for Google, and previously worked on GKE.
This is a completely optional integration, and is entirely up to the community to decide if they want to pursue it. If we are not interested, I can reach out to the Googlers that initially introduced support for Beyla to see if they are willing to add similar support for OBI.
cc @svrnm
Thanks @dashpole, having OBI added would be great
Scope: IMO we should onboard as "OpenTelemetry", rather than "OpenTelemetry-eBPF-Instrumentation" in-case other workloads would like to integrate in the future.
what other workloads could be relevant here: collector? injector?
I had the eBPF profiler in-mind, but those workloads may also want an integration.
- Can @open-telemetry/ebpf-instrumentation-maintainers confirm that they are interested in owning this as a SIG?
I can confirm for myself there is interest.
My understanding is the rest of the group is also interested in owning this.
I'm a maintainer of OBI and I'm interested in maintaining the GKE integration. Thanks for helping with this @dashpole !
eBPF profiler is also a great candidate for this and the OTel collector, since there's a build with the profiler included.
As a maintainer of OBI I'm also interested. Thanks!!
Interested as well, it would be great to clear a path for future OSS projects to onboard onto GKE Autopilot besides the partner program (that requires vendor backing)
Looks like we have all members of @open-telemetry/ebpf-instrumentation-approvers confirming, so let's go ahead with OBI.
Im also a donator of obi and am very interested, i think its also a good opportunity to add the ebpf-profiler collector distribution maybe @open-telemetry/ebpf-profiler-maintainers can say if its something that makes sense
@open-telemetry/collector-maintainers @open-telemetry/ebpf-profiler-maintainers @open-telemetry/injector-maintainers please take a look as well if this is interesting for you
Injector will probably not be impacted, but helm chart and operator SIGs would likely want to be roped in. @open-telemetry/operator-approvers and @open-telemetry/helm-approvers ping
From Collector Helm charts / Collector Contrib point of view, I noticed that have 2 things that require privilleged access:
- logsCollection.storeCheckpoints, requires runAsUser: 0, runAsGroup: 0. Ref https://github.com/open-telemetry/opentelemetry-helm-charts/blob/main/charts/opentelemetry-collector/templates/_pod.tpl#L32-L33
- Hostmetrics process scraper - https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/hostmetricsreceiver/internal/scraper/processscraper, which requires privilleged access.
Would be great to onboard Otel Collector Contrib / K8s Distro, as this features are not available at the moment for gke autopilot users.
I recently worked on a WorkloadAllowlist resource for the Dash0 operator, which manages (among other things) an OpenTelemetry collector daemonset. So my findings might be an interesting additional datapoint.
Here are the exemptions, we needed for our specific set of OTel collector components and the collector configuration we use:
apiVersion: auto.gke.io/v1
kind: WorkloadAllowlist
metadata:
name: ....
annotations:
autopilot.gke.io/no-connect: "true"
exemptions:
# Allow host ports for node-local OTLP traffic between monitored workloads and the OpenTelemetry collector DaemonSet pod
# on the same node. (Disclaimer: For some reason the host port usage was not rejected by GKE AP's warden webhook even without a matching exemption, despite Google's documentation saying that host ports require an exemption.)
- autogke-no-host-port
# Allow read-only host volume mount and mapping for "/" to enable the OpenTelemetry host metrics receiver, which
# requires mounting the node's file system as a read-only volume.
- autogke-no-write-mode-hostpath
# Allow the custom node affinities. Required if users configure the `affinity` setting in the OpenTelemetry collector Helm chart.
- autogke-node-affinity-selector-limitation
matchingCriteria:
...
Of course, the exact set of exemptions may vary quite a bit, depending on the OpenTelemetry collector configuration and which features of the OpenTelemetry collector Helm chart are used.
Additionally, there is one thing that I could not get to work on GKE Autopilot, and I did not find a matching exemption for yet:
If you use the kubeletstats receiver with any of the following
extra_metadata_labels(container.id,k8s.volume.type) enabled, or- utilization metrics (i.e.
k8s.pod.cpu_limit_utilization,k8s.pod.cpu_request_utilization,k8s.pod.memory_limit_utilization,k8s.pod.memory_request_utilization) enabled,
the kubeletstats receiver will try to talk to the /pods endpoint on the kubelet's read-only port 10250, which requires the RBAC permission "node/proxy" (api group "core", verb "get").
Apparently that permission "node/proxy" is not available for unprivileged workloads on GKE Autopilot. I also did not find any exemption for a privileged workload that would lift this restriction.
@svrnm is there a community-owned GCP project we should use for testing the allowlist (feel free to ping it to me on slack if it isn't public information), or is there a process for creating a new community-owned GCP project for this?
We also need a community-owned google group to manage access. Ideally we should have a dedicated group for this. Can you create one?
@dashpole apologies for not getting back earlier to you, this one slipped.
@open-telemetry/sig-project-infra-approvers (https://cloud-native.slack.com/archives/C07BPU981PV) can help with the community-owned GCP project and group creation.
we don't have a community-owned GCP project that I'm aware of: https://github.com/open-telemetry/community/blob/main/assets.md
@dashpole if you set one up, just document it in assets.md with yourself (and anyone else) as admins
I noticed we do have something Google Cloud related attached to the [email protected] google account:
https://github.com/open-telemetry/opentelemetry.io/blob/c9b72bab431e821bcc47e7c1d17d23b95f95b39a/content/en/blog/2025/go-opentelemetry-io-expired-certificate.md?plain=1#L65
I don't have any context about this, maybe it has to be due to domain ownership(?)