beats
beats copied to clipboard
Add metric for queue utilization percentage
Add a metric for the queue utilization (or fill) percentage. The metric can be calculated as the current size of the queue divided by the maximum size of the queue.
This metric can then be reported to Fleet when running under the Elastic Agent and used to easily detect and flag agents that are experiencing output backpressure.
Acceptance Criteria:
- The metric exists when polling /stats and in the 30s metrics in the beat logs
- The metric is queryable from the applicable metrics-elastic_agent.* data streams in the agent package https://github.com/elastic/integrations/tree/main/packages/elastic_agent
- The metric is available in the agent metrics dashboard in Fleet. The queue size is already available, the utilization could replace the size as it is much easier to interpret.
Pinging @elastic/elastic-agent (Team:Elastic-Agent)
@cmacknz when you say queue do you mean the libbeat publisher queue, or some ES output queue? Both?
@faec for a publisher queue, looking at the code in libbeat/publisher/pipeline/monitoring.go
, could we just get a queue percentage using events.active
over queue.max_events
?
@cmacknz when you say queue do you mean the libbeat publisher queue, or some ES output queue? Both?
The publisher queue, as in the queue of events waiting to be sent by the output, which can be either in memory or on disk.