opentelemetry-go-contrib
opentelemetry-go-contrib copied to clipboard
Runtime instrumentation: GC "total time spent" metric
I noticed a few things that I'd like to change about the GC timing metrics emitted by instrumentation/runtime
.
--
runtime.go.gc.pause_total_ns
should not include a "_ns" unit suffix.
runtime.go.gc.pause_ns
as well.
The main repository does not have a Unit declared for Nanoseconds. It's appropriate to report these in nanoseconds, since they are provided in nanoseconds, but I'd like to set the units being used so that downstream systems can normalize them into seconds.
--
There is a field MemStats.GCCPUFraction
that is not being reported. This is a cumulative summary computed over the process lifetime, which the Metrics SIG has come to realize is rarely useful for direct monitoring. The comment reads:
// GCCPUFraction is expressed as a number between 0 and 1,
// where 0 means GC has consumed none of this program's CPU. A
// program's available CPU time is defined as the integral of
// GOMAXPROCS since the program started. That is, if
// GOMAXPROCS is 2 and a program has been running for 10
// seconds, its "available CPU" is 20 seconds. GCCPUFraction
// does not include CPU time used for write barrier activity.
Therefore, the total time spent performing GC can be computed as follows:
uptime := time.Since(processStartTime)
gomaxprocs := float64(runtime.GOMAXPROCS(0))
gcSeconds := memStats.GCCPUFraction * uptime.Seconds() * gomaxprocs
This gcSeconds
is a number that we should be monitoring, as this can be compared directly with process.cpu.usage
.
This works as long as GOMAXPROCS() does not change, and with more sophisticated logic it could approximately track changes in GOMAXPROCS() as well.
I can take on adding a nannoseconds unit and updating the runtime instruments.
But GOMAXPROCS() indeed can change dynamically at runtime. It wasn't apparent to me how I could approximate a value for gcSeconds
if maxprocs were to change after a long period at a different value.
@jmacd is there a pointer for what the Metrics SIG has considered as useful for direct monitoring? I am planning to use the runtime plugin in knative/pkg but it seems to me that some metrics are not exposed like NextGC
and I am wondering if we should have like a full mode in case a user wants the raw dump as it comes from runtime.memstats (beyond having just what the specs describe which seem unfinished at this point of time).
Is there a way I can help to finalize the go part?
@skonto I want to address your question above, have filed https://github.com/open-telemetry/opentelemetry-go-contrib/issues/2624
@skonto See also https://github.com/golang/go/issues/47216
From SIG
- Try to add to semconv prior to adding
- Otherwise, just have this exported from a 3rd-party package.
Hello @jmacd Has this issue been solved?