graylog2-server icon indicating copy to clipboard operation
graylog2-server copied to clipboard

Add in memory `Original Source` to Message class

Open ryan-carroll-graylog opened this issue 3 months ago • 0 comments

What?

We add #setOriginalSource() and #getOriginalSource() methods to the Message class that set and return the original message as bytes or string. In addition to that, we add has_original_source and get_original_source (with an optional default return value) pipeline functions to check and retrieve the original source value.

In input codecs, we can add the original source to each message object. Not into the message fields map, but either as a separate field or the existing metadata map. That way the original source doesn't get indexed into OpenSearch.

The main drawback is that the memory consumption of the in-flight Message object increases. We can benchmark that to see how big of an impact that is. We can also add an option to inputs to disable the storage of the original source data in the in-memory message object. (default is enabled)

Why?

Proposed as potential solution to https://github.com/Graylog2/graylog2-server/issues/18416, but would be useful for many instances where Illuminate processing would benefit from access to the original message source but we would not want to store all data on the indexed message.

ryan-carroll-graylog avatar May 01 '24 16:05 ryan-carroll-graylog