semantic-conventions icon indicating copy to clipboard operation
semantic-conventions copied to clipboard

Guidance needed: `process` vs `system` vs `container` vs `k8s` vs runtime metrics

Open lmolkova opened this issue 1 year ago • 11 comments

We have multiple layers of metrics:

  • runtime-specific (JVM, Go, etc) reporting CPU, memory, etc from the runtime perspective with attributes specific to the runtime
  • process which reports OS level metrics per process as observed by the OS itself
  • system metrics that report OS metrics from the OS perspective
  • container metrics that are reported by the container runtime about container
  • k8s metrics are coming https://github.com/open-telemetry/semantic-conventions/issues/1032

Plus we have attributes is all of these namespaces that have something in common:

  • https://github.com/open-telemetry/semantic-conventions/issues/840
  • https://github.com/open-telemetry/semantic-conventions/issues/129

Problems:

  • When adding new metrics, such as system.linux.memory.available (https://github.com/open-telemetry/semantic-conventions/pull/1078), it's not clear if we'd expect to have OS-specific metrics in each of the namespaces (container.linux.memory.*, system.linux.*, process.linux.memory.*) https://github.com/open-telemetry/semantic-conventions/pull/1078#discussion_r1638375208
  • We end up defining similar attributes in each namespace
  • We should come with a framework to decide how/if to extend system/container metrics:
    • Do we need separate container and k8s metrics? Could the container orchestrator be an attribute?
    • Do we need separate process and system metrics - isn't system.cpu.time is a sum of all process.cpu.time on the same machine?

lmolkova avatar Jun 17 '24 17:06 lmolkova