fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

out_splunk: remove raw endpoint

Open pmeier opened this issue 1 year ago • 5 comments

Fixes #8927. This does not remove the ability to send raw events, i.e. using Splunk_Send_Raw On, but rather sends them to correct endpoint.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing Before we can approve your change; please submit the following in a comment:

  • [N/A] Example configuration file for the change
  • [N/A] Debug log output from testing the change
  • [N/A] Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • [N/A] Run local packaging test showing all targets (including any new ones) build.
  • [N/A] Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • [N/A] Documentation required for this feature

Backporting

  • [N/A] Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

pmeier avatar Jun 25 '24 11:06 pmeier

is /services/collector/event able to receive raw events ?

edsiper avatar Jun 25 '24 19:06 edsiper

@edsiper Could you define what exactly you mean by "raw events"? The term has a different meaning in fluent-bit than in splunk as explained in https://github.com/fluent/fluent-bit/issues/8927#issue-2339984112.

pmeier avatar Jun 25 '24 20:06 pmeier

I will double check on this, cannot remember all the details of the raw endpoint and why I implemented on that way at that moment (asking other maintainer to take a look at this too), thank you.

edsiper avatar Jul 01 '24 07:07 edsiper

From the Splunk official docs, Fluent Bit needs to add channel parameter as a URL parameter or as a header with x-splunk-request-channel when sending events for a raw endpoint.

Channel This endpoint requires a data channel GUID to differentiate data from different clients. Generate a GUID and provide it in a POST request as a custom HTTP header or as a parameter.

If a channel is not provided in the POST request, an error response is sent. Only valid GUIDs can be used. An error message is returned if GUID validation fails.

ref: https://docs.splunk.com/Documentation/Splunk/9.2.1/RESTREF/RESTinput#services.2Fcollector.2Fraw ref: https://docs.splunk.com/Documentation/Splunk/9.2.1/Data/AboutHECIDXAck#About_channels_and_sending_data

It seems that raw event point can handle JSON type of logs. Because the examples contain JSON case of sending payload.


However, Splunk's documents may complicated in this case. Because without indexer acknowledgement there is not necessity to use channels.

Sending events to HEC with indexer acknowledgment active is similar to sending them with the setting off. There is one crucial difference: when you have indexer acknowledgment turned on, you must specify a channel when you send events.

ref: https://docs.splunk.com/Documentation/Splunk/9.2.1/Data/AboutHECIDXAck#About_channels_and_sending_data

JSON request with timestamp curl  https://localhost:8088/services/collector/raw?channel=934793C0-FC91-467E-965A-7EAACEFBC4AB -H 'Authorization: Splunk 934793C0-FC91-467E-965A-7EAACEFBC4AB' -d '{"message":"Hello World", "date":"Wed Aug 10 12:27:53 PDT 2016"}'

If we use only for structured data, we're able to remove raw endpoint from out_splunk. However, I observed that raw endpoint without index acknowledgement can handle raw JSON events via raw endpoint.

Plus, if we remove raw endpoint and no needed to use specifying a raw endpoint, we need to remove splunk_send_raw config map which is defined here: https://github.com/fluent/fluent-bit/blob/master/plugins/out_splunk/splunk.c#L919-L925

cosmo0920 avatar Jul 01 '24 09:07 cosmo0920

If we use only for structured data, we're able to remove raw endpoint from out_splunk. However, I observed that raw endpoint without index acknowledgement can handle raw JSON events via raw endpoint.

Yeah, but if the event endpoint does what we want and we never sent raw strings, there is no point to ever trying to sent something to the raw endpoint. Hence, this PR.

Plus, if we remove raw endpoint and no needed to use specifying a raw endpoint, we need to remove splunk_send_raw config map which is defined here: master/plugins/out_splunk/splunk.c#L919-L925

That is not what we want. The "raw mode" in fluent-bit means that the record is sent as is to splunk without any processing (except for #8926). If activated, the user is responsible to bring the record into the right format required by splunk, for example by using a Lua filter before it. This behavior is necessary for cases when the configuration options that the out_splunk plugin provides are not sufficient. I'm facing such a use case and thus cannot use Splunk_Send_Raw Off.

When Splunk_Send_Raw Off is configured (default), the whole record is nested under the event key and one can configure other options to be inserted into the JSON data that is being sent to splunk. This is useful for a simple use case.

pmeier avatar Jul 01 '24 10:07 pmeier

If we use only for structured data, we're able to remove raw endpoint from out_splunk. However, I observed that raw endpoint without index acknowledgement can handle raw JSON events via raw endpoint.

Yeah, but if the event endpoint does what we want and we never sent raw strings, there is no point to ever trying to sent something to the raw endpoint. Hence, this PR.

Plus, if we remove raw endpoint and no needed to use specifying a raw endpoint, we need to remove splunk_send_raw config map which is defined here: master/plugins/out_splunk/splunk.c#L919-L925

That is not what we want. The "raw mode" in fluent-bit means that the record is sent as is to splunk without any processing (except for #8926). If activated, the user is responsible to bring the record into the right format required by splunk, for example by using a Lua filter before it. This behavior is necessary for cases when the configuration options that the out_splunk plugin provides are not sufficient. I'm facing such a use case and thus cannot use Splunk_Send_Raw Off.

When Splunk_Send_Raw Off is configured (default), the whole record is nested under the event key and one can configure other options to be inserted into the JSON data that is being sent to splunk. This is useful for a simple use case.

Ah, I got it. So, using raw endpoint is currently not efficient and inappropriate in fluent-bit. This motivation is what I wanted to know. Really appreciated to describe.

I realized that this change should be reasonable. But, the behavior changes should be described in fluent-bit's documentation properly.

Here is out_splunk's documentation: https://github.com/fluent/fluent-bit-docs/blob/master/pipeline/outputs/splunk.md#sending-raw-events

I also understand what you mean in this PR. This Splunk_Send_Raw is used for sending your modified logs types of events. In some cases as described in documentation, those are intended to behave like Splunk's metrics.

cosmo0920 avatar Jul 01 '24 11:07 cosmo0920

But, the behavior changes should be described in fluent-bit's documentation properly.

The documentation currently doesn't say anything about the endpoint the data is being sent to. I think this is fine given that this is more of an implementation detail of fluent-bit.

As for documenting the change: Splunk_Send_Raw On has not worked since fluent-bit==1.8, when the raw endpoint was introduced. Or maybe it has initially since one might be able to send JSON data (https://github.com/fluent/fluent-bit/pull/9007#issuecomment-2199665102), but it certainly doesn't work on fluent-bit==3.0.7. Meaning, I would treat this as bug fix rather than a feature change.

pmeier avatar Jul 01 '24 13:07 pmeier

thanks everybody

edsiper avatar Jul 02 '24 09:07 edsiper