Bug: Calculation of inflight events does not consider dropped documents
The calculation of inflight events is incorrect if documents are intentionally discarded in the logstash pipelines via the "drop" processor.
The method "calculateInflightEvents" (https://github.com/NETWAYS/check_logstash/blob/4bb9291fee062dee17440f615beed9c1c948c2b1/cmd/pipeline.go#L122) determines the current inflight events based on the global IN / OUT values. If documents are dropped in the pipeline, the values gradually diverge.
Steps to reproduce
- create a pipeline with a “drop” processor
- analyze the pipeline statistics via “localhost:9600/_node/stats/pipeline” logstash api.
- compare the values with the output of the Icinga check
Hi, from what I can tell, the "inflight events" were a custom metric that the original check plugin used, based on this calculation:
https://github.com/NETWAYS/check_logstash/blob/4bb9291fee062dee17440f615beed9c1c948c2b1/legacy/check_logstash#L363
When writing the Golang code, I decided to reuse this logic to keep compatibility with the old code. Generally I would say, it always tried to be an approximation, not sure though (I didn't write the original plugin), @widhalmt might have more information.
I'd say, where possible use the Pipeline Flow Metrics for newer Logstash versions.
@martialblog you're right: It always was an approximation. Full disclosure: I just didn't think of the possibility of dropping events when writing the initial version.
I'd really like to see an improvement like a more accurate representation. I'm just not able to write Golang code, yet. And, honestly, I think I'm not the right person to implement this.
While I can see a potential benefit of seeing the difference of incoming and outgoing events, it's definitely wrong to keep them as inflight_events.
We currently use the Node Stats API to get the in/out https://www.elastic.co/guide/en/logstash/current/node-stats-api.html#pipeline-stats
Not sure if we can get the information we require from this API. There's was a discussion about this already: https://github.com/NETWAYS/check_logstash/issues/44
If someone has an idea where to get the required info from the API it should be possible to update the calculation.