CGroup memory utilization metric in stack monitoring for integration server is not available on ESS
APM Server version (apm-server version): 8.3.*
Description of the problem including expected versus actual behavior: Stack monitoring should show memory utilization for integration server
Steps to reproduce:
- On ESS, open stack monitoring
- Open
Integrations server overview - Observe memory panel in
Integrations Server - Resource Usage
Other details
The metric seems to plot beats_stats.metrics.beat.cgroup.memory.mem.usage.bytes but as per metricbeat documents the correct field should be either beats_stats.metrics.beat.cgroup.mem.usage.bytes or beat.stats.cgroup.memory.mem.usage.bytes
This most certainly will require a fix in the Kibana code where the stack monitoring part lives.
I was looking into this but it seems the memory limit metric has the same issue:
The metric seems to plot beats_stats.metrics.beat.cgroup.memory.mem.limit.bytes but as per metricbeat documents the correct field should be either beats_stats.metrics.beat.cgroup.mem.limit.bytes or beat.stats.cgroup.memory.mem.limit.bytes.
I've opened a PR to address both.
With the help of @miltonhultgren and @fearful-symmetry the root cause was identified as cgroups V2 metric limits currently not being reported for the stats HTTP endpoint, see https://github.com/elastic/elastic-agent-system-metrics/issues/64
It could be that Kibana also doesn't manage this correctly, I took a brief look at @kruskall 's PR and it shows some places where we don't read from the new Metricbeat format, but I wanted the data fixed before so I could verify that!
Moving this to the backlog until the underlying issues have been resolved.