opentelemetry-go-contrib icon indicating copy to clipboard operation
opentelemetry-go-contrib copied to clipboard

Runtime instrumentation: GC "total time spent" metric

Open jmacd opened this issue 4 years ago • 6 comments

I noticed a few things that I'd like to change about the GC timing metrics emitted by instrumentation/runtime.

--

runtime.go.gc.pause_total_ns should not include a "_ns" unit suffix. runtime.go.gc.pause_ns as well.

The main repository does not have a Unit declared for Nanoseconds. It's appropriate to report these in nanoseconds, since they are provided in nanoseconds, but I'd like to set the units being used so that downstream systems can normalize them into seconds.

--

There is a field MemStats.GCCPUFraction that is not being reported. This is a cumulative summary computed over the process lifetime, which the Metrics SIG has come to realize is rarely useful for direct monitoring. The comment reads:

    // GCCPUFraction is expressed as a number between 0 and 1,
    // where 0 means GC has consumed none of this program's CPU. A
    // program's available CPU time is defined as the integral of
    // GOMAXPROCS since the program started. That is, if
    // GOMAXPROCS is 2 and a program has been running for 10
    // seconds, its "available CPU" is 20 seconds. GCCPUFraction
    // does not include CPU time used for write barrier activity.

Therefore, the total time spent performing GC can be computed as follows:

	uptime := time.Since(processStartTime)
        gomaxprocs := float64(runtime.GOMAXPROCS(0))
	gcSeconds := memStats.GCCPUFraction * uptime.Seconds() * gomaxprocs

This gcSeconds is a number that we should be monitoring, as this can be compared directly with process.cpu.usage.

This works as long as GOMAXPROCS() does not change, and with more sophisticated logic it could approximately track changes in GOMAXPROCS() as well.

jmacd avatar Aug 29 '20 02:08 jmacd

I can take on adding a nannoseconds unit and updating the runtime instruments.

But GOMAXPROCS() indeed can change dynamically at runtime. It wasn't apparent to me how I could approximate a value for gcSeconds if maxprocs were to change after a long period at a different value.

chrisleavoy avatar Sep 22 '20 20:09 chrisleavoy

@jmacd is there a pointer for what the Metrics SIG has considered as useful for direct monitoring? I am planning to use the runtime plugin in knative/pkg but it seems to me that some metrics are not exposed like NextGC and I am wondering if we should have like a full mode in case a user wants the raw dump as it comes from runtime.memstats (beyond having just what the specs describe which seem unfinished at this point of time). Is there a way I can help to finalize the go part?

skonto avatar Jan 27 '21 09:01 skonto

@skonto I want to address your question above, have filed https://github.com/open-telemetry/opentelemetry-go-contrib/issues/2624

jmacd avatar Aug 05 '22 19:08 jmacd

@skonto See also https://github.com/golang/go/issues/47216

jmacd avatar Aug 05 '22 19:08 jmacd

From SIG

  • Try to add to semconv prior to adding
  • Otherwise, just have this exported from a 3rd-party package.

MrAlias avatar Jan 18 '24 18:01 MrAlias

Hello @jmacd Has this issue been solved?

AkhigbeEromo avatar Mar 19 '24 16:03 AkhigbeEromo