otel-arrow icon indicating copy to clipboard operation
otel-arrow copied to clipboard

[otap-dataflow] syslog-cef receiver should expect syslog headers with CEF

Open clhain opened this issue 8 months ago • 8 comments

The syslog-CEF receiver currently doesn't seem to parse syslog messages with CEF message bodies. Sending an encoded CEF format message directly without the syslog header fields does parse correctly, but as I read it, this seems like incorrect behavior. Specifically:

(from the most official looking spec doc I can find)

Header Information

CEF uses Syslog as a transport mechanism. It uses the following format that contains a Syslog prefix, a header, and an extension:

Jan 18 11:07:53 host CEF:Version|Device Vendor|Device Product|Device Version|Device Event Class ID|Name|Severity|[Extension]

Using CEF Without Syslog

Syslog applies a syslog prefix to each message, no matter which device it arrives from, that contains the date and hostname in the following example:

Jan 18 11:07:53 host CEF:Version|…

Even if an event producer is unable to write Syslog messages, it is possible to write the events to a file by performing the following steps:

Discard the syslog prefix (Jan 18 11:07:53 host).

Begin the message with the following format:

CEF:Version|Device Vendor|Device Product|Device Version|Device Event Class ID|Name|Severity|[Extension]

I'm assuming the long term vision for this is something closer to what the go collector does where we can specify the expected syslog format and then apply various parsers to the message body (cef being one of many). For now this just prevents us from doing a CEF-based syslog performance comparison with identical messages here and against the go collector.

clhain avatar Sep 05 '25 17:09 clhain

@clhain Thanks for creating this issue!

Sending an encoded CEF format message directly without the syslog header fields does parse correctly, but as I read it, this seems like incorrect behavior.

I don't think this is incorrect. I think we are expected to parse CEF messages that don't have a Syslog header. However, not being able to parse CEF messages with a Syslog header looks incorrect.

I'm assuming the long term vision for this is something closer to what the go collector does where we can specify the expected syslog format and then apply various parsers to the message body (cef being one of many).

I'm envisioning a receiver, which supports parsing of all three formats dynamically instead of having the user restrict the accepted protocol. I think that improves the usability of this receiver. For example, if the collector is being used as a gateway with multiple sources of incoming data, we would most likely want the same receiver to support multiple formats.

We could consider adding an option to restrict the protocol accepted later on, if we conclude that it significantly improves performance for such scenarios.

For now this just prevents us from doing a CEF-based syslog performance comparison with identical messages here and against the go collector.

It shouldn't prevent you from doing a CEF-based performance test as you can use a CEF message without the Syslog header. As far as comparison with the Go based receiver is concerned, I don't think they support CEF parsing so you cannot compare it anyway. Check this issue: https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/37442

utpilla avatar Sep 08 '25 22:09 utpilla

Awesome, thanks Utkarsh! Yeah it's kind of an interesting topic... I completely agree with the usefulness of the dynamic parser stuff as long as it's not a performance hit. At the same time, if it doesn't have syslog headers is it really a syslog receiver parsing cef or is it more of a socket receiver capable of parsing cef, syslog, (syslog+cef) and eventully other formats (LEEF, json, logfmt, etc) all with and without syslog headers. Not sure the right answer or how best to make the various parsers useful for other actual receiver types that might send us text log looking bodies.

Anyway, for now as long as we support syslog+cef that will help me. I have a demo running with CEF parsing in the go collector (using generic transform processors since you're right it's not a natively supported thing in syslog receiver), and it makes it look like I'm sending it different rates of traffic without the headers in there. Like I said, not a huge deal to explain to people.

Anyway, this is really good work and hope you find this image as awesome as I do =)

Image

clhain avatar Sep 08 '25 23:09 clhain

Oops, here's a version that actually includes the send rates:

Image

clhain avatar Sep 08 '25 23:09 clhain

Anyway, for now as long as we support syslog+cef that will help me. I have a demo running with CEF parsing in the go collector (using generic transform processors since you're right it's not a natively supported thing in syslog receiver), and it makes it look like I'm sending it different rates of traffic without the headers in there. Like I said, not a huge deal to explain to people.

Until we fix the parsing, could we send the traffic without headers to Go collector as well to keep the test consistent?

Anyway, this is really good work and hope you find this image as awesome as I do =)

💯

Thank you for testing this!

utpilla avatar Sep 09 '25 03:09 utpilla

Until we fix the parsing, could we send the traffic without headers to Go collector as well to keep the test consistent?

Sadly not, as far as I know. The go collector is extremely rigid on compliance with either of the 2 rfc formats

clhain avatar Sep 09 '25 13:09 clhain

Sadly not, as far as I know. The go collector is extremely rigid on compliance with either of the 2 rfc formats

There is a configuration in the Go based receiver which I believe should allow you to send a CEF message without the Syslog headers.

Could you check whether setting allow_skip_pri_header to true lets you do that?

utpilla avatar Sep 09 '25 16:09 utpilla

Oh awesome, I will give that a try - thanks!

clhain avatar Sep 09 '25 17:09 clhain

As discussed in the linked issue, CEF parsing should be built as an OTTL function for the transform processor. Why? Simple: CEF, and also LEEF, can both appear in places other than a syslog stream. Additionally, they are not part of Syslog itself, but rather are tacked on as ways to encode additional metadata into the SYSLOG-MSG portion of Syslog.

Due to this, there is an issue for creating the ParseCEF OTTL function. This is where CEF parsing work should be done. Decoupled, so that it is usable within the pipeline, and be receiver agnostic.

Dylan-M avatar Sep 17 '25 17:09 Dylan-M