apm-server icon indicating copy to clipboard operation
apm-server copied to clipboard

CGroup memory utilization metric in stack monitoring for integration server is not available on ESS

Open lahsivjar opened this issue 3 years ago • 1 comments

APM Server version (apm-server version): 8.3.*

Description of the problem including expected versus actual behavior: Stack monitoring should show memory utilization for integration server

Steps to reproduce:

  1. On ESS, open stack monitoring
  2. Open Integrations server overview
  3. Observe memory panel in Integrations Server - Resource Usage

Other details

The metric seems to plot beats_stats.metrics.beat.cgroup.memory.mem.usage.bytes but as per metricbeat documents the correct field should be either beats_stats.metrics.beat.cgroup.mem.usage.bytes or beat.stats.cgroup.memory.mem.usage.bytes

lahsivjar avatar Jul 13 '22 04:07 lahsivjar

This most certainly will require a fix in the Kibana code where the stack monitoring part lives.

simitt avatar Sep 06 '22 14:09 simitt

I was looking into this but it seems the memory limit metric has the same issue: The metric seems to plot beats_stats.metrics.beat.cgroup.memory.mem.limit.bytes but as per metricbeat documents the correct field should be either beats_stats.metrics.beat.cgroup.mem.limit.bytes or beat.stats.cgroup.memory.mem.limit.bytes.

I've opened a PR to address both.

kruskall avatar Oct 24 '22 11:10 kruskall

With the help of @miltonhultgren and @fearful-symmetry the root cause was identified as cgroups V2 metric limits currently not being reported for the stats HTTP endpoint, see https://github.com/elastic/elastic-agent-system-metrics/issues/64

simitt avatar Nov 17 '22 09:11 simitt

It could be that Kibana also doesn't manage this correctly, I took a brief look at @kruskall 's PR and it shows some places where we don't read from the new Metricbeat format, but I wanted the data fixed before so I could verify that!

miltonhultgren avatar Nov 17 '22 09:11 miltonhultgren

Moving this to the backlog until the underlying issues have been resolved.

simitt avatar Nov 22 '22 21:11 simitt