apm icon indicating copy to clipboard operation
apm copied to clipboard

cgroups v2 container.id discovery

Open graphaelli opened this issue 2 years ago • 7 comments

Is your feature request related to a problem? Please describe. cgroups v2 is increasingly seeing adoption as various distributions have made it the default for containers. As noted in https://github.com/elastic/beats/issues/16958, Fedora 31 (late 2019) enables it by default, Ubuntu 21.10 does as well.

The current spec covers only cgroups v1, this issue is a feature request for v2 support.

Describe the solution you'd like When running applications on systems with cgroups v2 enabled, for example on docker, container.id should be filled in for events produced by APM agents.

Additional context The current metrics spec touches on collecting cgroups v2 metrics without specific guidance on how to identify the cgroup itself, that should be updated as well. The java and python agents may provide insight into the updates required, like consulting /proc/self/mountinfo instead of /proc/self/cgroup when cgroups v2 are detected.

graphaelli avatar Oct 13 '21 19:10 graphaelli

https://stackoverflow.com/questions/68816329/how-to-get-docker-container-id-from-within-the-container-with-cgroup-v2 discusses using upperdir=(.+?) in an entry in /proc/self/mountinfo. That may be limited (my vague, perhaps obsolete, recollection from earlier Docker days was that OverlayFS wasn't always the file driver). It also provides an ID that is different than Docker's container ID.

https://github.com/iovisor/bcc/issues/1119 discusses how there isn't a kernel concept of container ID, so this likely comes down to heuristics specific to each container runtime (docker, k8s, podman, systemd, etc.) ... or just being out of luck if nothing is exposed inside the container.

Gil, you mentioned perhaps having assist from a host-local APM server.

What breaks when a container.id is missing? Can hostname be a (poor) fallback?

trentm avatar Oct 14 '21 03:10 trentm

assist from a host-local APM server.

Good point, I'm not sure how that would work but it is worth considering if, as you wrote, the id is not reliably discoverable from within the container.

What breaks when a container.id is missing?

Workflows based on pivoting on that data are impacted. For example, viewing application service logs either are not shown or scoped only to the host/node level which may (likely!) be running various unrelated containers - sometimes useful, but usually you want to start at container and zoom out to that level if needed. That's a really simple example but I hope it demonstrates that type of issue missing this information causes.

graphaelli avatar Oct 14 '21 19:10 graphaelli

Reminded me this is still a problem

image

graphaelli avatar Apr 01 '22 20:04 graphaelli

One workaround for those coming across this issue is to start containers with --cgroupns=host - I've confirmed container.id is picked up under cgroupsv2 with docker using that option. That's not available via docker compose yet - tracked in https://github.com/compose-spec/compose-spec/issues/148

graphaelli avatar Jun 15 '22 15:06 graphaelli

This is reportedly an issue with at least three of the APM agents so far, with 2/3 waiting for a decision in this thread before taking any action.

What breaks when a container.id is missing?

  • Infrastructure inventory is polluted by the containers that are incorrectly reported as hosts by the agent.
  • Association between actual hosts and traces are lost.

Nacoma avatar Mar 07 '23 23:03 Nacoma

The current state of the art (StackOverflow, Jenkins, OpenTelemetry JS) seems to be to read and parse /proc/self/mountinfo for the container ID -- as I saw back in Oct 2021.

https://github.com/opencontainers/runtime-spec/issues/1105 seems to be a/the issue to follow for there eventually/possibly being a standardize mechanism for this. Until then, we should update our spec to fallback to parsing /proc/self/mountinfo.

trentm avatar Mar 27 '23 19:03 trentm

On OpenTelemetry Java side, cgroups v2 container ID is currently implemented by parsing /proc/self/mountinfo

There is no mention of pod ID however.

SylvainJuge avatar May 23 '23 14:05 SylvainJuge