semantic-conventions Guidance needed: `process` vs `system` vs `container` vs `k8s` vs runtime metrics

Guidance needed: `process` vs `system` vs `container` vs `k8s` vs runtime metrics

Open lmolkova opened this issue 1 year ago • 11 comments

We have multiple layers of metrics:

runtime-specific (JVM, Go, etc) reporting CPU, memory, etc from the runtime perspective with attributes specific to the runtime
process which reports OS level metrics per process as observed by the OS itself
system metrics that report OS metrics from the OS perspective
container metrics that are reported by the container runtime about container
k8s metrics are coming https://github.com/open-telemetry/semantic-conventions/issues/1032

Plus we have attributes is all of these namespaces that have something in common:

Problems:

When adding new metrics, such as system.linux.memory.available (https://github.com/open-telemetry/semantic-conventions/pull/1078), it's not clear if we'd expect to have OS-specific metrics in each of the namespaces (container.linux.memory.*, system.linux.*, process.linux.memory.*) https://github.com/open-telemetry/semantic-conventions/pull/1078#discussion_r1638375208
We end up defining similar attributes in each namespace
We should come with a framework to decide how/if to extend system/container metrics:
- Do we need separate container and k8s metrics? Could the container orchestrator be an attribute?
- Do we need separate process and system metrics - isn't system.cpu.time is a sum of all process.cpu.time on the same machine?

Jun 17 '24 17:06 lmolkova