apm-agent-java icon indicating copy to clipboard operation
apm-agent-java copied to clipboard

Measure GC overhead

Open henrikno opened this issue 4 years ago • 2 comments

Is your feature request related to a problem?

Sometimes our processes are struggling with GC, but it's not easy to "spot" or alert on if it's not at the point of OOMing, but it's just so busy doing GC it's effectively not getting it's work done. We'd like to be aware of instances that are in this state, and possibly alert on it.

Describe the solution you'd like

It'd be nice if the APM agent could record an approximation of how much time it has spent on GC.

Elasticsearch has a similar solution https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/monitor/jvm/JvmGcMonitorService.java It's doing it as a scheduled task. I think it's also possible using https://docs.oracle.com/en/java/javase/11/docs/api/jdk.management/com/sun/management/GarbageCollectionNotificationInfo.html

What we want it something we can eventually alert on, e.g. if overhead > 50%, your application isn't getting much real work done.

Describe alternatives you've considered

It's possible to collect it via custom metrics e.g. micrometer, but we have some services where we don't want to add custom code to.

Additional context

henrikno avatar Apr 27 '21 17:04 henrikno

The agent already exports the number of gcs and their (cumulative) duration via the jvm.gc.count and jvm.gc.time metrics (see https://www.elastic.co/guide/en/apm/agent/java/current/metrics.html#metrics-jvm). Shouldn't it be possible to use these for your alerting?

tobiasstadler avatar May 10 '21 12:05 tobiasstadler

I agree you can do this from existing metrics. But also agree this is a nice separate metric to have. I think it's something that could be added as a derived metric in the dashboard and presented as standard

jackshirazi avatar Apr 21 '22 10:04 jackshirazi