logstash icon indicating copy to clipboard operation
logstash copied to clipboard

POC: "Recent Memory" ring-buffer queue type

Open Jarpy opened this issue 3 years ago • 8 comments

What's This?

This proof-of-concept demonstrates a new queue type that is implemented as a ring-buffer that does not block when full.

Use Case

An operator uses Logstash as an event router to ship events to both a long-term storage engine (eg. Elasticsearch), and to a real-time, stream-based alerting system. The naïve implementation uses Logstash, with a single pipeline, to send copies of all events to both the storage engine and the alerting engine.

image

This design is not very resilient if one of the event destinations becomes unavailable. For example, if the storage engine is unable to accept events, the in-memory pipeline queue rapidly fills to capacity and stalls. This stops the event stream to the alerting engine, even though it's quite prepared to accept events itself. Realising this, the operator implements the Output Isolator Pattern.

image

This places large, independent cues in front of each output so that one can stall without affecting the other. By necessity, these queues are defined as persisted cues, because memory queues can't provide the needed capacity. Unfortunately, this has a significant performance cost. Each event now incurs the cost of being processed by a persisted queue twice. The cost is acceptable for events that are destined for the storage engine, because the operator never wants to lose an event in that context. For the real-time alerting engine, the cost is much harder to justify. Buffering events for long periods of time has little value in this context. Data destined for the alerting engine have rapidly diminishing value over time. Here, the operator has been forced to make an unpleasant decision. They don't actually want a persisted queue, they just need a queue that won't block, and thus also block the upstream input queue.

Using the "recent_memory" queue, they achieive the goals of the Output Isolator Pattern, without the cost of unneeded PQs.

image

The recent_memory queue will never block, always accepting new events (at the expense of old ones).

Expressed in terms of the CAP theorem, recent_memory provides A,P. The standard memory queue provides C,P, and the persisted queue provides C,A. So, the addition of this new queue type allows the operator to choose any of the three vertices on the "CAP triangle", depending on their design goals.

Jarpy avatar Dec 17 '21 05:12 Jarpy

cc @Crazybus, @drewr

Jarpy avatar Dec 17 '21 05:12 Jarpy

Sorry about the close/reopen. Mistakenly clicked the wrong button!

roaksoax avatar Jan 10 '22 18:01 roaksoax

@jarpy conceptually, this is AWESOME. Our team talked about the work briefly in a sync today and there is some serious enthusiasm for adding this vertex of the CAP triangle, and addressing a very real use-case.

We'll need to add docs and tie in metrics so that users can reason about the expected data loss under back-pressure, and will be working to get this remaining effort into the roadmap.

yaauie avatar Jan 10 '22 19:01 yaauie

That is amazing news. Thanks so much for considering this and for championing the effort.

Shall we close this for real, then? The implementation presented here is obviously rudimentary, I just wanted to see if it could be done.

Jarpy avatar Jan 10 '22 22:01 Jarpy

Let's keep this one open, and close it only when we have the work fully specified and on the roadmap.

yaauie avatar Jan 11 '22 00:01 yaauie

This PR would implement the enhancement request from https://github.com/elastic/logstash/issues/11601

jsvd avatar Mar 17 '22 15:03 jsvd

I am currently facing a similar issue as described above, having one stable "persistent" backend and one unreliable "streaming" backend and that ring-buffer mechanism would solve all my worries! Thanks a lot for starting this effort, i would really love to see this implementation available upstream :+1:

makefu avatar Mar 21 '22 23:03 makefu

Hey, is there any chance to get this PR (or at least the functionalty) to get upstream? Right now i am thinking about working around this with an rsyslog receiving all logs from logstash and in turn forwarding these logs to a queued remote output. Cheers!

makefu avatar May 31 '22 11:05 makefu

What is missing to complete this and merge?

sambercovici avatar Aug 14 '23 10:08 sambercovici