beats Add metric for queue utilization percentage

Add metric for queue utilization percentage

Open cmacknz opened this issue 10 months ago • 3 comments

Add a metric for the queue utilization (or fill) percentage. The metric can be calculated as the current size of the queue divided by the maximum size of the queue.

This metric can then be reported to Fleet when running under the Elastic Agent and used to easily detect and flag agents that are experiencing output backpressure.

Acceptance Criteria:

The metric exists when polling /stats and in the 30s metrics in the beat logs
The metric is queryable from the applicable metrics-elastic_agent.* data streams in the agent package https://github.com/elastic/integrations/tree/main/packages/elastic_agent
The metric is available in the agent metrics dashboard in Fleet. The queue size is already available, the utilization could replace the size as it is much easier to interpret.

Apr 02 '24 23:04 cmacknz

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

Apr 02 '24 23:04 elasticmachine

@cmacknz when you say queue do you mean the libbeat publisher queue, or some ES output queue? Both?

@faec for a publisher queue, looking at the code in libbeat/publisher/pipeline/monitoring.go, could we just get a queue percentage using events.active over queue.max_events?

Apr 24 '24 21:04 fearful-symmetry

@cmacknz when you say queue do you mean the libbeat publisher queue, or some ES output queue? Both?

The publisher queue, as in the queue of events waiting to be sent by the output, which can be either in memory or on disk.

Apr 25 '24 15:04 cmacknz

beats beats copied to clipboard

Add metric for queue utilization percentage

beats
beats copied to clipboard