check_logstash icon indicating copy to clipboard operation
check_logstash copied to clipboard

Bug: Calculation of inflight events does not consider dropped documents

Open saiiman opened this issue 1 year ago • 3 comments

The calculation of inflight events is incorrect if documents are intentionally discarded in the logstash pipelines via the "drop" processor.

The method "calculateInflightEvents" (https://github.com/NETWAYS/check_logstash/blob/4bb9291fee062dee17440f615beed9c1c948c2b1/cmd/pipeline.go#L122) determines the current inflight events based on the global IN / OUT values. If documents are dropped in the pipeline, the values gradually diverge.

Steps to reproduce

  1. create a pipeline with a “drop” processor
  2. analyze the pipeline statistics via “localhost:9600/_node/stats/pipeline” logstash api.
  3. compare the values with the output of the Icinga check

saiiman avatar Nov 08 '24 11:11 saiiman

Hi, from what I can tell, the "inflight events" were a custom metric that the original check plugin used, based on this calculation:

https://github.com/NETWAYS/check_logstash/blob/4bb9291fee062dee17440f615beed9c1c948c2b1/legacy/check_logstash#L363

When writing the Golang code, I decided to reuse this logic to keep compatibility with the old code. Generally I would say, it always tried to be an approximation, not sure though (I didn't write the original plugin), @widhalmt might have more information.

I'd say, where possible use the Pipeline Flow Metrics for newer Logstash versions.

martialblog avatar Nov 27 '24 09:11 martialblog

@martialblog you're right: It always was an approximation. Full disclosure: I just didn't think of the possibility of dropping events when writing the initial version.

I'd really like to see an improvement like a more accurate representation. I'm just not able to write Golang code, yet. And, honestly, I think I'm not the right person to implement this.

While I can see a potential benefit of seeing the difference of incoming and outgoing events, it's definitely wrong to keep them as inflight_events.

widhalmt avatar Nov 27 '24 09:11 widhalmt

We currently use the Node Stats API to get the in/out https://www.elastic.co/guide/en/logstash/current/node-stats-api.html#pipeline-stats

Not sure if we can get the information we require from this API. There's was a discussion about this already: https://github.com/NETWAYS/check_logstash/issues/44

If someone has an idea where to get the required info from the API it should be possible to update the calculation.

martialblog avatar Nov 27 '24 10:11 martialblog