Are processes allowed to report their process.cpu.time?
What are you trying to achieve?
Clarification on whether language-based instrumentation is allowed to report process.cpu.time for its process.
The specification currently says:
OS process metrics are not related to the runtime environment of the program, and should take measurements from the operating system. For runtime environment metrics see semantic conventions for runtime environment metrics.
I think there are roughly three options:
- Allow instrumentations to report
process.cpu.timefor their own process (resource attributes would differentiate it fromprocess.cpu.timereported by the OpenTelemetry Collector) - Don't allow it, and make a local OpenTelemetry Collector a requirement for all deployments (that want this key metric of application health)
- Don't allow it, and make a copy of this metric under
process.runtime.*orprocess.runtime.[jvm].*, e.g.process.runtime.cpu.timeorprocess.runtime.[jvm].cpu.time
I think this is not about "allowing", it is about is it expected to have this instrumentation? Do we want to maintain this in all the language? etc.
/cc @yurishkuro
I think this may be a language-ecosystem thing.
The JVM has been emitting "process cpu time" for almost 2 decades, so Java users very much expect this to be emitted by any Java metric library.
Allow instrumentations to report process.cpu.time for their own process (resource attributes would differentiate it from process.cpu.time reported by the OpenTelemetry Collector)
Did you also mean process.cpu.utilization? Looks this issue was opened based on conversation from #2436.
I'm beginning to question if it is even possible for an instrumentation library to support process.cpu.utilization given its current definition:
Difference in process.cpu.time since the last measurement, divided by the elapsed time and number of CPUs available to the process.
In the event of multiple meter providers, their reporting intervals may be different. So, calculating the difference in process.cpu.time since the last measurement requires the instrumentation to maintain some state per meter provider. I don't think the metric API spec offers the support necessary for this.
Saw .Net 9.0 SDK added a new member Environment.CpuUsage to get current process's CPU usage via Interop.Sys.GetCpuUtilization, Will this help to implement CPUUasge telemetrying for DotNet?
@open-telemetry/semconv-system-approvers PTAL
@svrnm Thanks for adding this to the System Semconv Project board!
I believe this issue can be closed.
This was resolved some time last year. I can't find what issue it was in the semantic-conventions repo, as I don't remember what keywords were in the issue where the discussion happened. The TL;DR is that language runtimes in general should report their own versions of these metrics; what looks at first like a lot of duplicated definitions actually ends up being a heavy simplification for the definitions of the metrics. Different runtimes report their memory/cpu usage metrics in slightly different ways, and making one process.cpu.time metric amenable to all those different possibilities ends up making the definition complicated to understand. While it's true the Entity (formerly just Resource Attributes) would differentiate a process.cpu.time coming from OS instrumentation vs some language runtime instrumentation, the main challenge would be that metrics with the same process namespace would have slightly different semantics when reported under different resources, and it would be confusing at a glance for users to understand how to use the metric.
As a result, it was decided that process.* metrics are very specifically for metrics retrieved by instrumenting the operating system (i.e. reading procfs or using processthreadsapi.h). This ensures there is only one clear definition for the semantics, and the only considerations the metric designs need to make are within OS-specific concerns (which is complicated enough sometimes).
(To clarify, much of this discussion actually arose out of memory metrics rather than CPU metrics; CPU metrics across runtimes do tend to be similar [count of seconds since start time with user system or idle states] but memory metrics end up drastically differing due to reporting different collections of states in such a way that it was very challenging to unify.)
I am unsure how to answer the recent question about how .NET SDK can leverage this. I can at least point out that the dotnet.process.cpu.time metric is defined in the CLR metrics in semantic-conventions, so whatever the SDK does do can follow the existing conventions.
@open-telemetry/spec-sponsors please take a look and decide if based on @braydonk's answer this can be closed or if a follow up issue is needed