Closes #6451.

Follows CEF version 26.

Open questions

There are two extensions for this parser that I'm unsure if they should be included now, later, or at all:

[x] Translate CEF Key Names to Full Name. Example: "act" to "deviceAction", (EDIT: Not at all.)
[x] Construct key-value from key label fields. Example: (EDIT: A separate issue. )

{
"c6a1": "value1",
"c6a1Label": "key1"
}

to

{
"key1": "value1",
}

Sep 12 '22 23:09 ktff

Deploy Preview for vector-project canceled.

Name	Link
Latest commit	4b4e2086a2d74969faed12b0e70cd11361963eee
Latest deploy log	https://app.netlify.com/sites/vector-project/deploys/63348648f2b2c10009800c9b

Sep 12 '22 23:09 netlify[bot]

Soak Test Results

Baseline: 28113af2bf357c71957bf377225dc9df2415cf8f Comparison: 4886c19003a72e7af86fe30dbca89a4f59918dc6 Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

Fine details of change detection per experiment.

experiment	Δ mean	Δ mean %	confidence	baseline mean	baseline stdev	baseline stderr	baseline CoV	comparison mean	comparison stdev	comparison stderr	comparison CoV	erratic	declared erratic
datadog_agent_remap_blackhole_acks	2.13MiB	3.71	100.00%	57.38MiB	4.41MiB	91.72KiB	0.076762	59.5MiB	3.46MiB	72.52KiB	0.0581851	False	False
datadog_agent_remap_blackhole	1.25MiB	2.22	100.00%	56.25MiB	3.76MiB	78.25KiB	0.0667581	57.5MiB	3.94MiB	82.06KiB	0.0684304	False	False
http_text_to_http_json	699.3KiB	1.82	100.00%	37.53MiB	900.93KiB	18.39KiB	0.0234363	38.22MiB	852.95KiB	17.41KiB	0.0217917	False	False
splunk_hec_route_s3	260.84KiB	1.38	99.99%	18.46MiB	2.37MiB	49.27KiB	0.128177	18.72MiB	2.17MiB	45.32KiB	0.115719	False	False
http_pipelines_blackhole_acks	14.44KiB	1.14	100.00%	1.23MiB	112.01KiB	2.28KiB	0.0886288	1.25MiB	80.21KiB	1.63KiB	0.0627486	False	False
syslog_humio_logs	123.45KiB	0.74	100.00%	16.3MiB	179.6KiB	3.67KiB	0.0107565	16.42MiB	155.79KiB	3.19KiB	0.00926216	False	False
datadog_agent_remap_datadog_logs_acks	444.2KiB	0.74	99.99%	58.97MiB	3.24MiB	67.66KiB	0.0548727	59.41MiB	4.44MiB	92.42KiB	0.074717	False	False
http_to_http_acks	119.39KiB	0.67	37.96%	17.37MiB	8.06MiB	168.48KiB	0.463991	17.49MiB	8.26MiB	172.41KiB	0.472311	True	True
socket_to_socket_blackhole	154.01KiB	0.66	100.00%	22.67MiB	409.17KiB	8.35KiB	0.0176232	22.82MiB	178.5KiB	3.64KiB	0.00763762	False	False
syslog_regex_logs2metric_ddmetrics	51.0KiB	0.41	99.65%	12.28MiB	651.84KiB	13.28KiB	0.0518225	12.33MiB	556.77KiB	11.35KiB	0.0440856	False	False
syslog_log2metric_humio_metrics	51.53KiB	0.4	99.98%	12.5MiB	281.42KiB	5.74KiB	0.0219787	12.55MiB	606.7KiB	12.35KiB	0.0471939	False	False
syslog_splunk_hec_logs	16.58KiB	0.1	63.62%	16.34MiB	739.47KiB	15.05KiB	0.044195	16.35MiB	506.32KiB	10.34KiB	0.030231	False	False
splunk_hec_to_splunk_hec_logs_noack	9.65KiB	0.04	61.28%	23.83MiB	435.79KiB	8.9KiB	0.0178572	23.84MiB	330.24KiB	6.74KiB	0.0135269	False	False
splunk_hec_indexer_ack_blackhole	4.48KiB	0.02	13.32%	23.74MiB	930.31KiB	18.92KiB	0.0382638	23.74MiB	924.64KiB	18.8KiB	0.0380238	False	False
enterprise_http_to_http	717.05B	0	7.61%	23.84MiB	253.34KiB	5.17KiB	0.0103733	23.85MiB	253.94KiB	5.2KiB	0.0103977	False	False
splunk_hec_to_splunk_hec_logs_acks	-2.46KiB	-0.01	7.65%	23.75MiB	889.03KiB	18.08KiB	0.0365475	23.75MiB	892.55KiB	18.15KiB	0.0366962	False	False
file_to_blackhole	-17.27KiB	-0.02	11.71%	95.33MiB	3.91MiB	80.97KiB	0.0409638	95.31MiB	4.08MiB	84.82KiB	0.0427583	False	False
http_to_http_json	-30.03KiB	-0.12	98.76%	23.85MiB	333.03KiB	6.8KiB	0.0136332	23.82MiB	484.08KiB	9.89KiB	0.0198411	False	False
datadog_agent_remap_datadog_logs	-107.25KiB	-0.17	71.68%	60.38MiB	1.92MiB	40.16KiB	0.0317461	60.28MiB	4.39MiB	91.5KiB	0.0728925	False	False
http_pipelines_no_grok_blackhole	-22.35KiB	-0.21	68.81%	10.62MiB	139.1KiB	2.84KiB	0.0127897	10.6MiB	1.05MiB	21.92KiB	0.0993108	False	False
fluent_elasticsearch	-170.99KiB	-0.21	100.00%	79.47MiB	53.0KiB	1.07KiB	0.000651135	79.31MiB	1.56MiB	32.08KiB	0.019653	False	False
http_to_http_noack	-77.3KiB	-0.32	99.98%	23.85MiB	254.72KiB	5.21KiB	0.0104294	23.77MiB	982.23KiB	20.01KiB	0.0403448	False	False
http_pipelines_blackhole	-11.65KiB	-0.66	100.00%	1.73MiB	10.96KiB	229.29B	0.0061985	1.72MiB	122.81KiB	2.5KiB	0.0699126	False	False
syslog_loki	-114.94KiB	-0.76	100.00%	14.71MiB	379.75KiB	7.77KiB	0.0252129	14.59MiB	738.83KiB	15.02KiB	0.0494313	False	False
syslog_log2metric_splunk_hec_metrics	-252.8KiB	-1.47	100.00%	16.82MiB	890.69KiB	18.16KiB	0.0517027	16.57MiB	1.04MiB	21.68KiB	0.0628038	False	False

Sep 13 '22 00:09 github-actions[bot]

Translate CEF Key Names to Full Name. Example: "act" to "deviceAction"

I would not do that. *act", in this example, is the short name as defined in the CEF docs which is ok to work with. Changing all short names to full names (as per the docs) has little value IMHO. Chances are that you will rename the keys anyways.

Construct key-value from key label fields.

This could be interesting and help working with CEF logs. I would make it optional though as one might not need it.

Sep 13 '22 07:09 sim0nx

I'm looking at the CEF v26 specification from here: https://www.microfocus.com/documentation/arcsight/arcsight-smartconnectors-8.3/pdfdoc/cef-implementation-standard/cef-implementation-standard.pdf

I'm not very familiar with this format, but the documentation seems to imply the most common format is with a syslog prefix, which the current implementation doesn't support. It would be good to support that, or explain why it's not needed.

For example, I expected the following example to parse correctly

Sep 29 08:26:10 host CEF:1|Security|threatmanager|1.0|100|worm successfully
stopped|10|src=10.0.0.1 dst=2.1.2.2 spt=1232

Sep 13 '22 13:09 fuchsnj

I'm looking at the CEF v26 specification from here: https://www.microfocus.com/documentation/arcsight/arcsight-smartconnectors-8.3/pdfdoc/cef-implementation-standard/cef-implementation-standard.pdf

I'm not very familiar with this format, but the documentation seems to imply the most common format is with a syslog prefix, which the current implementation doesn't support. It would be good to support that, or explain why it's not needed.

For example, I expected the following example to parse correctly
Sep 29 08:26:10 host CEF:1|Security|threatmanager|1.0|100|worm successfully
stopped|10|src=10.0.0.1 dst=2.1.2.2 spt=1232

@fuchsnj shouldn't one use the syslog source (or parse_syslog) in that case to parse the syslog part of the message and use parse_cef for the CEF part ?

Sep 13 '22 15:09 sim0nx

@fuchsnj shouldn't one use the syslog source (or parse_syslog) in that case to parse the syslog part of the message and use parse_cef for the CEF part ?

It's just a syslog prefix as part of the CEF header. It's not a full syslog message (it won't parse correctly with parse_syslog, and even if it did, you wouldn't get the remainder to pass to parse_cef). I think it would be acceptable to skip everything before the CEF:Version header.

Sep 13 '22 16:09 fuchsnj

It's not a full syslog message (it won't parse correctly with parse_syslog, and even if it did, you wouldn't get the remainder to pass to parse_cef).

That is an issue. Then, we could try to parse syslog prefix else, if we fail, discard everything up to CEF header.

Sep 13 '22 16:09 ktff

@fuchsnj shouldn't one use the syslog source (or parse_syslog) in that case to parse the syslog part of the message and use parse_cef for the CEF part ?

It's just a syslog prefix as part of the CEF header. It's not a full syslog message (it won't parse correctly with parse_syslog, and even if it did, you wouldn't get the remainder to pass to parse_cef). I think it would be acceptable to skip everything before the CEF:Version header.

I am working with CEF log files... what I am seeing is two variants:

no syslog parts, basically only CEF over
normal syslog line with as message the CEF part

parse_syslog, and even if it did, you wouldn't get the remainder to pass to parse_cef

according to https://vector.dev/docs/reference/vrl/functions/#parse_syslog, you would get a message, which is the original line minus the syslog part... so you would pass the message to parse_cef, at least that's what I had in mind ... CMIIAW

I think it would be acceptable to skip everything before the CEF:Version header.

I would opt for that ... should I care for the "syslog prefix" as well, I could parse it somehow else (e.g. grok etc) and use the CEF part to pass to parse_cef ... or just have parse_cef ignore everything before CEF:Version as you say.

my 2c

Sep 13 '22 18:09 sim0nx

First of all, thank you very much for implementing/working on this function. 🎉 A number of our pipelines require to parse CEF data and so far we are just using VRL for this, creating boilerplate. We had in our notes to actually request this feature so this PR and initiative is highly appreciated.

Similar to @sim0nx , we also have to deal with different formats for "CEF-over-Syslog" data. Examples:

May 31 10:01:02 hostname CEF:0|Zscaler|NSSWeblog|5.0|Allowed|...
<13>Jun  4 10:29:58 hostname CEF:0|Palo Alto Networks|PAN-OS|...

In our case, they differ in the inclusion of the syslog PRIO header.

For both cases, in VRL, we first apply parse_syslog() to obtain the message part and then the following parse-cef transform that we made ourselves for the purpose:

host = string!(.host)
message = string!(.message)
fields = split(message, "|", limit: 8)
if length(fields) < 8 {
  log("invalid CEF message from <" + host + ">: " + message, level: "warn")
  abort
}
. = {
  "host": host,
  "version": to_int!(fields[0]),
  "device_vendor": fields[1],
  "device_product": fields[2],
  "device_version": fields[3],
  "device_event_class_id": fields[4],
  "name": fields[5],
  "severity": fields[6],
  "extension": fields[7],
}

The above is just parsing the CEF format itself strictly. To parse the rest, we made a second-stage parse-XYZ transform for the extension field itself. As you can imagine, there are huge regexes, boilerplate and duplicated code across lot of our pipelines.

For example:

host = string!(.host)
extension = string!(.extension)
., err = parse_regex(extension, r'^act=(?P<eventAction>[\w\d\s./_-]+) ')  # can't share the full regex :(
if err != null {
  log("failed to parse zscaler CEF extension from <" + host + ">: " + extension, level: "warn")
  abort
}

The example you show in Construct key-value from key label fields looks VERY valuable and useful to us :)

Hope the above gives some light on real-world use-cases for your proposed parse_cef() function.

Sep 14 '22 09:09 hhromic

Thanks @sim0nx @hhromic.

Then we can add two following modifications:

When parsing discard everything up to CEF:Version. -- As @sim0nx said, with this parsing will just work regardless if it's embedded in syslog or not. And if that prefix is useful it can be parsed out with syslog parser. The parser would just work as @fuchsnj expected.
Construct key-value from key label fields. -- This seems to be useful and it the domain of this parser. Also I wouldn't make it optional since this mechanism is defined in the specification as a way to have custom keys in CEF, to avoid its limitations. And since the output is no longer CEF the limitation is no longer present, so there is no reason for that anymore. At least until a real world case is presented.

Sep 14 '22 23:09 ktff

@ktff sounds good to me!

Construct key-value from key label fields. -- This seems to be useful and it the domain of this parser. Also I wouldn't make it optional since this mechanism is defined in the specification as a way to have custom keys in CEF, to avoid its limitations. And since the output is no longer CEF the limitation is no longer present, so there is no reason for that anymore. At least until a real world case is presented.

I agree with that too. We already feel very excited to get this type of transformation "for free" in this function:

{"c6a1":"value1","c6a1Label":"key1"} -> {"key1":"value1"}

While it should be rare, what would happen if multiple X + XLabel pairs share the same key? It would be constructing an array in that case? For example:

{"c6a1":"value1","c6a1Label":"key1","c5a1":"value2","c5a1Label":"key1"} -> {"key1":["value1","value2"]}

I think other functions in VRL already do like the above. So it would be nice for consistency.

Sep 15 '22 14:09 hhromic

@hhromic that seems like really rare for CEF and a bit hacky way to transmit an array, so I'm not sure if it should be supported. While just silently dropping the field isn't ideal either. So instead of that, on collision we can not perform transformation for that pair.

Sep 15 '22 22:09 ktff

@ktff yes, definitively CEF (on paper) should not have duplicated XLabel values for different X. And definitively I don't think CEF would ever intend to transmit arrays either in that way "natively". I say "on paper" because our team has seen a vast amount of CEF data from different devices/software during years and I'm not sure you can imagine the ugly formatting horrors that appear on the wild. That's why I was asking how the proposed parse_cef() function would handle such a rare but not impossible situation.

So instead of that, on collision we can not perform transformation for that pair.

I think that would cause more headaches than solutions in the long run. If data ever happens to come with duplicated labels, I think is easier to handle in an array (after conversion) than leaving "as-is". Especially because it would be harder to detect "as-is".

Just in case it was not clear, the "to-array" logic would only use an array output if-and-only-if there are duplicate keys found during parsing. Otherwise the field value remains a simple string. Eg:

{"c6a1":"value1","c6a1Label":"key1","c5a1":"value2","c5a1Label":"key1"} -> {"key1":["value1","value2"]}
{"c6a1":"value1","c6a1Label":"key1","c5a1":"value2","c5a1Label":"key2"} -> {"key1":"value1","key2":"value2"}

In this way, in the vast majority of cases there will never be arrays in the values, except if a device decides to duplicate labels.

I can't remember right now which other function in VRL behaves like this, but tomorrow I will check the docs and report back.

Sep 15 '22 23:09 hhromic

@ktff here are at least two other functions in VRL that behave as described above:

$ parse_key_value!("key1=value1")
{ "key1": "value1" }

$ parse_key_value!("key1=value1 key1=value2")
{ "key1": ["value1", "value2"] }

$ parse_query_string("?key1=value1")
{ "key1": "value1" }

$ parse_query_string("?key1=value1&key1=value2")
{ "key1": ["value1", "value2"] }

Sep 16 '22 15:09 hhromic

@ktff here are at least two other functions in VRL that behave as described above:

$ parse_key_value!("key1=value1")
{ "key1": "value1" }

$ parse_key_value!("key1=value1 key1=value2")
{ "key1": ["value1", "value2"] }

$ parse_query_string("?key1=value1")
{ "key1": "value1" }

$ parse_query_string("?key1=value1&key1=value2")
{ "key1": ["value1", "value2"] }

I think this would be a reasonable behavior to continue following - unless the spec is 100% that duplicate keys will never happen. We've also stuck pretty close to "implemented per spec, regardless of in-the-wild behavior doesn't always match that"

Sep 16 '22 16:09 spencergilbert

While it should be rare, what would happen if multiple X + XLabel pairs share the same key? It would be constructing an array in that case?

One downside to keep in mind, if duplicates are placed in arrays that means the VRL type definitions will always be string | array which means functions using the output of parse_cef that expect only strings as input will be fallible, or require coercing to a string first. So if there isn't a good reason to allow duplicates, it will be easier to work with if only 1 value is kept for each key.

Sep 16 '22 16:09 fuchsnj

We've also stuck pretty close to "implemented per spec, regardless of in-the-wild behavior doesn't always match that"

I also like to follow specs and standards as much as possible, but it is a shame that vendors sometimes don't :(

We like Vector a lot because it easily allows us to deal with bad data. The syslog parser in Vector is a good example of a very resilient parser that does not drop bad data after trying its best to parse malformed syslog.

One downside to keep in mind, if duplicates are placed in arrays that means the VRL type definitions will always be string | array which means functions using the output of parse_cef that expect only strings as input will be fallible, or require coercing to a string first. So if there isn't a good reason to allow duplicates, it will be easier to work with if only 1 value is kept for each key.

That is a very interesting point indeed. I just checked and indeed parse_key_value() and parse_query_string() suffer the same caveat already.

I guess in the end it would be okay for parse_cef() to only process single-valued keys, as long as it doesn't discard incoming data because of this. Perhaps, the way of handling duplicate keys could be configured, pretty much like how it was discussed in this (still pending) PR: https://github.com/vectordotdev/vector/pull/11580#issuecomment-1055018531

Sep 16 '22 17:09 hhromic

We've also stuck pretty close to "implemented per spec, regardless of in-the-wild behavior doesn't always match that"

I also like to follow specs and standards as much as possible, but it is a shame that vendors sometimes don't :(

We like Vector a lot because it easily allows us to deal with bad data. The syslog parser in Vector is a good example of a very resilient parser that does not drop bad data after trying its best to parse malformed syslog.

Ah, I guess this wasn't fully thought out when I typed this. That's generally been the case on the inputs/outputs for Vector, to avoid bad/malformed data coming into your pipeline. Definitely want to keep the VRL handling flexible and robust.

Example being the syslog source rejects invalid messages when it fails to decode, and the case of "pseudo-syslog" - you can use the socket source and handle the decoding in VRL.

Sep 16 '22 17:09 spencergilbert

@hhromic parse_key_value and parse_query_string are different beasts, they don't have any key set so they need to be generic as possible. While, as @fuchsnj mentioned, when parsing valid CEF we can always return a string value since there are no duplicates, which is nice. But I do agree that

So instead of that, on collision we can not perform transformation for that pair.

isn't a good solution.

So let's after all make this translation a separate feature. It seems like it will require an option so that it's opt in. Once active it can change the type definition of the parser so that it returns string and array. Or at least that seems like a way to do it.

Sep 16 '22 23:09 ktff

Soak Test Results

Baseline: d498040a770ae2bb5c9d25efce62acadcb17ee57 Comparison: 326838a0fdea56253010e431862b47455eb17d5c Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

Fine details of change detection per experiment.

experiment	Δ mean	Δ mean %	confidence	baseline mean	baseline stdev	baseline stderr	baseline CoV	comparison mean	comparison stdev	comparison stderr	comparison CoV	erratic	declared erratic
http_pipelines_blackhole	46.63KiB	2.86	100.00%	1.59MiB	88.55KiB	1.81KiB	0.054326	1.64MiB	138.62KiB	2.82KiB	0.0826826	False	False
syslog_loki	209.87KiB	1.43	100.00%	14.33MiB	244.61KiB	5.01KiB	0.0166715	14.53MiB	747.16KiB	15.19KiB	0.0502059	False	False
datadog_agent_remap_blackhole_acks	263.01KiB	0.42	99.10%	60.95MiB	4.17MiB	86.78KiB	0.0683613	61.2MiB	2.44MiB	51.08KiB	0.0398841	False	False
datadog_agent_remap_blackhole	68.12KiB	0.11	42.96%	58.06MiB	4.55MiB	94.8KiB	0.0784111	58.12MiB	3.53MiB	73.63KiB	0.0606835	False	False
splunk_hec_to_splunk_hec_logs_noack	8.45KiB	0.03	55.30%	23.83MiB	428.47KiB	8.74KiB	0.0175542	23.84MiB	335.31KiB	6.84KiB	0.013733	False	False
enterprise_http_to_http	-1.97KiB	-0.01	21.31%	23.85MiB	251.01KiB	5.12KiB	0.0102769	23.85MiB	253.94KiB	5.2KiB	0.0103978	False	False
splunk_hec_to_splunk_hec_logs_acks	-16.88KiB	-0.07	57.65%	23.79MiB	702.39KiB	14.31KiB	0.0288281	23.77MiB	760.8KiB	15.49KiB	0.031247	False	False
splunk_hec_indexer_ack_blackhole	-19.7KiB	-0.08	57.35%	23.77MiB	808.03KiB	16.45KiB	0.0331957	23.75MiB	910.89KiB	18.53KiB	0.037452	False	False
file_to_blackhole	-73.76KiB	-0.08	47.33%	95.34MiB	3.48MiB	72.22KiB	0.0365323	95.27MiB	4.4MiB	91.43KiB	0.0461631	False	False
splunk_hec_route_s3	-14.96KiB	-0.08	16.99%	18.13MiB	2.39MiB	49.72KiB	0.131638	18.12MiB	2.34MiB	48.83KiB	0.128866	False	False
http_to_http_json	-24.89KiB	-0.1	95.81%	23.84MiB	356.93KiB	7.29KiB	0.0146172	23.82MiB	480.54KiB	9.82KiB	0.0196993	False	False
http_to_http_noack	-61.07KiB	-0.25	99.65%	23.84MiB	407.17KiB	8.32KiB	0.0166772	23.78MiB	940.55KiB	19.17KiB	0.0386208	False	False
syslog_log2metric_humio_metrics	-37.47KiB	-0.3	99.94%	12.21MiB	199.01KiB	4.06KiB	0.0159097	12.18MiB	496.26KiB	10.1KiB	0.0397918	False	False
fluent_elasticsearch	-383.53KiB	-0.47	100.00%	79.47MiB	55.64KiB	1.12KiB	0.000683536	79.1MiB	4.1MiB	84.24KiB	0.0518146	False	False
http_text_to_http_json	-212.58KiB	-0.54	100.00%	38.34MiB	874.56KiB	17.85KiB	0.0222714	38.13MiB	869.93KiB	17.76KiB	0.0222742	False	False
datadog_agent_remap_datadog_logs_acks	-572.38KiB	-0.92	100.00%	60.92MiB	3.21MiB	67.09KiB	0.0526934	60.36MiB	4.28MiB	89.14KiB	0.0709264	False	False
datadog_agent_remap_datadog_logs	-704.88KiB	-1.11	100.00%	62.28MiB	639.3KiB	13.1KiB	0.010023	61.59MiB	4.26MiB	88.72KiB	0.0691786	False	False
syslog_regex_logs2metric_ddmetrics	-172.63KiB	-1.34	100.00%	12.54MiB	508.1KiB	10.36KiB	0.0395612	12.37MiB	440.41KiB	8.98KiB	0.034758	False	False
syslog_splunk_hec_logs	-229.67KiB	-1.38	100.00%	16.22MiB	675.26KiB	13.76KiB	0.0406376	16.0MiB	755.93KiB	15.39KiB	0.0461301	False	False
http_pipelines_blackhole_acks	-17.85KiB	-1.43	100.00%	1.22MiB	110.64KiB	2.25KiB	0.0887391	1.2MiB	84.53KiB	1.72KiB	0.0687789	False	False
http_to_http_acks	-292.92KiB	-1.64	77.33%	17.48MiB	8.19MiB	171.23KiB	0.46865	17.19MiB	8.21MiB	171.4KiB	0.477627	True	True
syslog_humio_logs	-431.23KiB	-2.53	100.00%	16.62MiB	224.26KiB	4.58KiB	0.0131741	16.2MiB	232.39KiB	4.76KiB	0.0140064	False	False
syslog_log2metric_splunk_hec_metrics	-454.07KiB	-2.57	100.00%	17.24MiB	825.2KiB	16.81KiB	0.0467251	16.8MiB	740.78KiB	15.1KiB	0.0430522	False	False
http_pipelines_no_grok_blackhole	-301.92KiB	-2.77	100.00%	10.65MiB	336.27KiB	6.86KiB	0.0308161	10.36MiB	1.1MiB	22.97KiB	0.106508	False	False
socket_to_socket_blackhole	-677.6KiB	-2.9	100.00%	22.79MiB	624.03KiB	12.74KiB	0.0267291	22.13MiB	534.12KiB	10.9KiB	0.0235622	False	False

Sep 17 '22 01:09 github-actions[bot]

Soak Test Results

Baseline: d498040a770ae2bb5c9d25efce62acadcb17ee57 Comparison: 7f0d88e8511dd0c98b28f71e531f2b42ef1ad275 Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

Fine details of change detection per experiment.

experiment	Δ mean	Δ mean %	confidence	baseline mean	baseline stdev	baseline stderr	baseline CoV	comparison mean	comparison stdev	comparison stderr	comparison CoV	erratic	declared erratic
syslog_loki	127.0KiB	0.9	100.00%	13.77MiB	411.87KiB	8.42KiB	0.0292098	13.89MiB	714.19KiB	14.52KiB	0.0501988	False	False
datadog_agent_remap_blackhole_acks	405.99KiB	0.66	99.93%	59.83MiB	4.78MiB	99.5KiB	0.0798753	60.23MiB	3.22MiB	67.26KiB	0.0534066	False	False
datadog_agent_remap_blackhole	333.39KiB	0.53	99.94%	60.86MiB	3.93MiB	81.81KiB	0.0645111	61.19MiB	2.46MiB	51.39KiB	0.0402224	False	False
http_pipelines_blackhole_acks	4.17KiB	0.34	86.66%	1.19MiB	115.36KiB	2.35KiB	0.0944111	1.2MiB	73.0KiB	1.49KiB	0.0595414	False	False
http_pipelines_blackhole	2.72KiB	0.16	71.61%	1.69MiB	43.22KiB	903.84B	0.0249883	1.69MiB	116.89KiB	2.38KiB	0.0674695	False	False
splunk_hec_to_splunk_hec_logs_noack	15.73KiB	0.06	81.76%	23.82MiB	470.56KiB	9.61KiB	0.0192869	23.84MiB	335.41KiB	6.85KiB	0.0137387	False	False
splunk_hec_to_splunk_hec_logs_acks	15.34KiB	0.06	46.56%	23.75MiB	881.87KiB	17.93KiB	0.0362608	23.76MiB	834.05KiB	16.97KiB	0.0342727	False	False
splunk_hec_indexer_ack_blackhole	7.86KiB	0.03	23.11%	23.74MiB	950.54KiB	19.33KiB	0.0390981	23.74MiB	910.13KiB	18.51KiB	0.0374239	False	False
enterprise_http_to_http	-1.19KiB	-0	13.13%	23.85MiB	248.57KiB	5.07KiB	0.0101775	23.85MiB	249.36KiB	5.1KiB	0.0102102	False	False
file_to_blackhole	-62.21KiB	-0.06	66.63%	95.38MiB	2.05MiB	42.42KiB	0.0214527	95.32MiB	2.33MiB	48.37KiB	0.0243906	False	False
http_to_http_json	-36.33KiB	-0.15	99.45%	23.84MiB	345.93KiB	7.06KiB	0.0141648	23.81MiB	538.78KiB	11.0KiB	0.0220942	False	False
fluent_elasticsearch	-219.59KiB	-0.27	100.00%	79.47MiB	53.43KiB	1.08KiB	0.000656466	79.26MiB	2.55MiB	52.45KiB	0.0321784	False	False
http_to_http_noack	-96.23KiB	-0.39	99.98%	23.83MiB	519.19KiB	10.61KiB	0.0212732	23.73MiB	1.15MiB	23.96KiB	0.0484144	False	False
syslog_log2metric_humio_metrics	-61.43KiB	-0.47	100.00%	12.71MiB	221.81KiB	4.53KiB	0.0170334	12.65MiB	543.91KiB	11.07KiB	0.0419655	False	False
datadog_agent_remap_datadog_logs	-506.62KiB	-0.81	100.00%	61.17MiB	279.83KiB	5.73KiB	0.00446658	60.67MiB	3.95MiB	82.36KiB	0.0651559	False	False
syslog_regex_logs2metric_ddmetrics	-131.53KiB	-1.03	100.00%	12.52MiB	635.63KiB	12.95KiB	0.0495824	12.39MiB	509.03KiB	10.38KiB	0.0401185	False	False
syslog_splunk_hec_logs	-211.83KiB	-1.28	100.00%	16.12MiB	881.87KiB	17.95KiB	0.0534003	15.92MiB	865.95KiB	17.63KiB	0.0531176	False	False
http_text_to_http_json	-615.72KiB	-1.57	100.00%	38.42MiB	816.69KiB	16.67KiB	0.0207568	37.81MiB	1.16MiB	24.15KiB	0.0305482	False	False
splunk_hec_route_s3	-298.64KiB	-1.62	100.00%	18.04MiB	2.33MiB	48.53KiB	0.129137	17.75MiB	2.3MiB	48.03KiB	0.129347	False	False
http_pipelines_no_grok_blackhole	-283.41KiB	-2.53	100.00%	10.95MiB	63.77KiB	1.3KiB	0.00568599	10.67MiB	1.03MiB	21.5KiB	0.0967261	False	False
http_to_http_acks	-456.67KiB	-2.53	94.34%	17.65MiB	8.11MiB	169.6KiB	0.459505	17.2MiB	8.1MiB	169.09KiB	0.470596	True	True
syslog_log2metric_splunk_hec_metrics	-626.03KiB	-3.39	100.00%	18.02MiB	546.28KiB	11.13KiB	0.0295932	17.41MiB	762.43KiB	15.52KiB	0.0427533	False	False
syslog_humio_logs	-613.46KiB	-3.57	100.00%	16.79MiB	133.03KiB	2.72KiB	0.00773777	16.19MiB	542.76KiB	11.12KiB	0.0327375	False	False
datadog_agent_remap_datadog_logs_acks	-2.23MiB	-3.63	100.00%	61.48MiB	3.02MiB	63.21KiB	0.0491566	59.25MiB	4.64MiB	96.65KiB	0.0783425	False	False
socket_to_socket_blackhole	-889.8KiB	-3.65	100.00%	23.8MiB	194.37KiB	3.97KiB	0.00797235	22.93MiB	107.48KiB	2.19KiB	0.00457538	False	False

Sep 17 '22 01:09 github-actions[bot]

I'm looking at the CEF v26 specification from here: https://www.microfocus.com/documentation/arcsight/arcsight-smartconnectors-8.3/pdfdoc/cef-implementation-standard/cef-implementation-standard.pdf

I'm not very familiar with this format, but the documentation seems to imply the most common format is with a syslog prefix, which the current implementation doesn't support. It would be good to support that, or explain why it's not needed.

For example, I expected the following example to parse correctly
Sep 29 08:26:10 host CEF:1|Security|threatmanager|1.0|100|worm successfully
stopped|10|src=10.0.0.1 dst=2.1.2.2 spt=1232

@fuchsnj this now works as expected.

Sep 17 '22 08:09 ktff

Soak Test Results

Baseline: d498040a770ae2bb5c9d25efce62acadcb17ee57 Comparison: dfe24fd5af7722d726b8bd9670cb54ea8b8204d5 Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

Fine details of change detection per experiment.

experiment	Δ mean	Δ mean %	confidence	baseline mean	baseline stdev	baseline stderr	baseline CoV	comparison mean	comparison stdev	comparison stderr	comparison CoV	erratic	declared erratic
http_pipelines_blackhole_acks	10.65KiB	0.86	99.99%	1.21MiB	116.89KiB	2.38KiB	0.0943029	1.22MiB	70.35KiB	1.43KiB	0.0562726	False	False
syslog_loki	127.0KiB	0.86	100.00%	14.37MiB	265.59KiB	5.44KiB	0.0180406	14.5MiB	748.05KiB	15.21KiB	0.0503778	False	False
http_text_to_http_json	48.6KiB	0.12	94.31%	38.14MiB	905.12KiB	18.48KiB	0.0231674	38.19MiB	862.46KiB	17.6KiB	0.0220482	False	False
splunk_hec_to_splunk_hec_logs_noack	5.23KiB	0.02	38.59%	23.83MiB	380.09KiB	7.77KiB	0.0155712	23.84MiB	336.19KiB	6.86KiB	0.0137699	False	False
enterprise_http_to_http	650.73B	0	6.93%	23.85MiB	251.21KiB	5.13KiB	0.0102859	23.85MiB	254.6KiB	5.21KiB	0.0104243	False	False
http_pipelines_blackhole	-255.45B	-0.01	7.84%	1.66MiB	49.62KiB	1.01KiB	0.0291759	1.66MiB	113.99KiB	2.32KiB	0.0670314	False	False
splunk_hec_to_splunk_hec_logs_acks	-7.48KiB	-0.03	25.34%	23.77MiB	788.62KiB	16.05KiB	0.0323959	23.76MiB	818.49KiB	16.66KiB	0.0336332	False	False
splunk_hec_indexer_ack_blackhole	-12.26KiB	-0.05	35.70%	23.75MiB	889.01KiB	18.09KiB	0.0365419	23.74MiB	948.56KiB	19.29KiB	0.0390094	False	False
file_to_blackhole	-48.17KiB	-0.05	39.37%	95.35MiB	3.04MiB	63.01KiB	0.0318738	95.3MiB	3.32MiB	69.02KiB	0.0348198	False	False
http_to_http_noack	-25.99KiB	-0.11	83.62%	23.83MiB	521.42KiB	10.65KiB	0.0213647	23.8MiB	751.72KiB	15.33KiB	0.0308342	False	False
http_to_http_json	-42.0KiB	-0.17	99.86%	23.85MiB	327.46KiB	6.69KiB	0.0134073	23.81MiB	555.67KiB	11.34KiB	0.0227901	False	False
fluent_elasticsearch	-157.2KiB	-0.19	100.00%	79.47MiB	52.52KiB	1.06KiB	0.000645199	79.32MiB	1.56MiB	32.06KiB	0.0196179	False	False
http_to_http_acks	-59.0KiB	-0.33	19.65%	17.4MiB	8.03MiB	167.93KiB	0.461386	17.34MiB	8.03MiB	167.35KiB	0.462666	True	True
syslog_log2metric_humio_metrics	-44.7KiB	-0.35	100.00%	12.34MiB	241.99KiB	4.94KiB	0.0191518	12.29MiB	473.09KiB	9.63KiB	0.0375747	False	False
datadog_agent_remap_blackhole_acks	-303.19KiB	-0.48	99.78%	61.56MiB	3.99MiB	83.12KiB	0.0647964	61.27MiB	2.55MiB	53.43KiB	0.0416761	False	False
splunk_hec_route_s3	-104.64KiB	-0.56	86.98%	18.11MiB	2.38MiB	49.49KiB	0.131177	18.0MiB	2.31MiB	48.27KiB	0.128191	False	False
datadog_agent_remap_datadog_logs_acks	-451.39KiB	-0.71	100.00%	62.24MiB	2.8MiB	58.63KiB	0.045018	61.8MiB	4.39MiB	91.36KiB	0.0710098	False	False
datadog_agent_remap_blackhole	-441.06KiB	-0.8	93.62%	53.91MiB	7.92MiB	165.07KiB	0.146824	53.48MiB	8.21MiB	171.36KiB	0.153427	False	False
datadog_agent_remap_datadog_logs	-518.57KiB	-0.81	100.00%	62.27MiB	303.97KiB	6.22KiB	0.00476598	61.76MiB	3.8MiB	79.25KiB	0.0615644	False	False
syslog_splunk_hec_logs	-196.61KiB	-1.17	100.00%	16.37MiB	809.69KiB	16.46KiB	0.0482945	16.18MiB	678.41KiB	13.8KiB	0.0409445	False	False
syslog_regex_logs2metric_ddmetrics	-208.46KiB	-1.62	100.00%	12.58MiB	599.76KiB	12.22KiB	0.046555	12.37MiB	444.36KiB	9.06KiB	0.0350598	False	False
syslog_humio_logs	-305.38KiB	-1.81	100.00%	16.45MiB	494.52KiB	10.1KiB	0.0293471	16.15MiB	468.01KiB	9.59KiB	0.0282863	False	False
syslog_log2metric_splunk_hec_metrics	-449.62KiB	-2.43	100.00%	18.08MiB	488.35KiB	9.96KiB	0.026373	17.64MiB	679.97KiB	13.85KiB	0.0376353	False	False
http_pipelines_no_grok_blackhole	-287.41KiB	-2.58	100.00%	10.89MiB	53.36KiB	1.09KiB	0.00478409	10.61MiB	990.72KiB	20.16KiB	0.0911755	False	False
socket_to_socket_blackhole	-632.2KiB	-2.61	100.00%	23.65MiB	382.39KiB	7.81KiB	0.0157892	23.03MiB	163.43KiB	3.34KiB	0.00692898	False	False

Sep 17 '22 09:09 github-actions[bot]

Example being the syslog source rejects invalid messages when it fails to decode, and the case of "pseudo-syslog" - you can use the socket source and handle the decoding in VRL.

Yes, indeed we parse with VRL + socket source instead of the syslog source directly due to the need to properly log errors during parsing, i.e. log the offending packet and source peer. This is something that the sources in general are not very good at the moment (see #7750) :(

So let's after all make this translation a separate feature. It seems like it will require an option so that it's opt in. Once active it can change the type definition of the parser so that it returns string and array. Or at least that seems like a way to do it.

I fully agree to better move this part to another PR so this one here can make progress. Indeed looks like that part needs more discussion. Apologies for the noise!

Sep 17 '22 12:09 hhromic

Soak Test Results

Baseline: 512da4076a67a229996f96e51e711dc2af37dcf2 Comparison: 8117579af027460f10ea0bbb5956163dd1aa3a2a Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

Fine details of change detection per experiment.

experiment	Δ mean	Δ mean %	confidence	baseline mean	baseline stdev	baseline stderr	baseline CoV	comparison mean	comparison stdev	comparison stderr	comparison CoV	erratic	declared erratic
http_to_http_acks	287.83KiB	1.66	77.76%	16.96MiB	7.98MiB	166.96KiB	0.470654	17.24MiB	7.98MiB	166.57KiB	0.462744	True	True
http_pipelines_blackhole_acks	15.37KiB	1.24	100.00%	1.21MiB	103.73KiB	2.11KiB	0.0839684	1.22MiB	91.41KiB	1.86KiB	0.0730894	False	False
syslog_loki	104.39KiB	0.72	100.00%	14.16MiB	406.01KiB	8.32KiB	0.0279889	14.27MiB	722.48KiB	14.69KiB	0.0494497	False	False
splunk_hec_to_splunk_hec_logs_acks	22.03KiB	0.09	64.20%	23.75MiB	871.27KiB	17.72KiB	0.0358235	23.77MiB	792.59KiB	16.13KiB	0.0325592	False	False
splunk_hec_indexer_ack_blackhole	22.42KiB	0.09	64.97%	23.75MiB	876.66KiB	17.83KiB	0.0360423	23.77MiB	789.74KiB	16.07KiB	0.0324387	False	False
splunk_hec_to_splunk_hec_logs_noack	23.0KiB	0.09	93.56%	23.82MiB	503.11KiB	10.27KiB	0.0206241	23.84MiB	343.25KiB	7.01KiB	0.0140575	False	False
enterprise_http_to_http	55.43B	0	0.59%	23.85MiB	253.86KiB	5.18KiB	0.010394	23.85MiB	254.69KiB	5.21KiB	0.0104279	False	False
file_to_blackhole	-65.46KiB	-0.07	55.98%	95.35MiB	2.72MiB	56.41KiB	0.0285348	95.29MiB	3.04MiB	63.31KiB	0.0319455	False	False
http_to_http_json	-33.03KiB	-0.14	99.16%	23.85MiB	335.84KiB	6.86KiB	0.0137507	23.81MiB	512.63KiB	10.47KiB	0.0210177	False	False
fluent_elasticsearch	-159.74KiB	-0.2	100.00%	79.47MiB	54.02KiB	1.09KiB	0.000663709	79.32MiB	1.41MiB	28.9KiB	0.0177107	False	False
http_to_http_noack	-76.91KiB	-0.32	99.85%	23.83MiB	508.86KiB	10.4KiB	0.0208505	23.75MiB	1.05MiB	21.9KiB	0.0442039	False	False
syslog_splunk_hec_logs	-155.74KiB	-0.96	100.00%	15.89MiB	812.2KiB	16.52KiB	0.0499162	15.73MiB	620.69KiB	12.67KiB	0.0385151	False	False
datadog_agent_remap_blackhole	-631.07KiB	-1.06	100.00%	58.27MiB	3.79MiB	78.9KiB	0.0649641	57.66MiB	3.22MiB	67.08KiB	0.055759	False	False
syslog_humio_logs	-190.72KiB	-1.18	100.00%	15.83MiB	666.56KiB	13.61KiB	0.0411163	15.64MiB	588.87KiB	12.05KiB	0.0367565	False	False
syslog_regex_logs2metric_ddmetrics	-163.06KiB	-1.29	100.00%	12.37MiB	614.64KiB	12.53KiB	0.0485166	12.21MiB	584.3KiB	11.91KiB	0.0467231	False	False
http_pipelines_blackhole	-28.02KiB	-1.65	100.00%	1.66MiB	55.85KiB	1.14KiB	0.0327915	1.64MiB	123.86KiB	2.52KiB	0.0739303	False	False
syslog_log2metric_splunk_hec_metrics	-324.59KiB	-1.75	100.00%	18.09MiB	614.55KiB	12.52KiB	0.0331675	17.77MiB	812.45KiB	16.53KiB	0.0446304	False	False
syslog_log2metric_humio_metrics	-247.28KiB	-1.88	100.00%	12.82MiB	197.12KiB	4.03KiB	0.0150086	12.58MiB	508.37KiB	10.35KiB	0.039451	False	False
splunk_hec_route_s3	-407.38KiB	-2.11	100.00%	18.84MiB	2.29MiB	47.8KiB	0.121755	18.44MiB	2.21MiB	46.29KiB	0.119914	False	False
datadog_agent_remap_datadog_logs	-1.32MiB	-2.13	100.00%	61.98MiB	456.92KiB	9.35KiB	0.00719798	60.66MiB	4.06MiB	84.54KiB	0.0669117	False	False
http_pipelines_no_grok_blackhole	-257.15KiB	-2.34	100.00%	10.73MiB	332.93KiB	6.8KiB	0.0302982	10.48MiB	1.08MiB	22.58KiB	0.103532	False	False
datadog_agent_remap_datadog_logs_acks	-1.5MiB	-2.43	100.00%	61.57MiB	3.08MiB	64.37KiB	0.0500053	60.07MiB	4.23MiB	88.11KiB	0.0704481	False	False
datadog_agent_remap_blackhole_acks	-1.46MiB	-2.47	100.00%	58.94MiB	4.24MiB	88.34KiB	0.0719742	57.48MiB	2.91MiB	60.9KiB	0.0506441	False	False
http_text_to_http_json	-1.85MiB	-4.67	100.00%	39.57MiB	1.11MiB	23.17KiB	0.0280107	37.73MiB	1005.7KiB	20.54KiB	0.0260275	False	False
socket_to_socket_blackhole	-1.49MiB	-6.06	100.00%	24.6MiB	289.29KiB	5.91KiB	0.0114829	23.11MiB	124.6KiB	2.54KiB	0.0052645	False	False

Sep 20 '22 18:09 github-actions[bot]

Soak Test Results

Baseline: 9cf1ea9b08ed745e3872c1cc81757f6078c82419 Comparison: acb41109c3d5e6839de9a32542e1f75fc58433bf Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

Fine details of change detection per experiment.

experiment	Δ mean	Δ mean %	confidence	baseline mean	baseline stdev	baseline stderr	baseline CoV	comparison mean	comparison stdev	comparison stderr	comparison CoV	erratic	declared erratic
http_pipelines_blackhole_acks	20.56KiB	1.69	100.00%	1.19MiB	137.87KiB	2.8KiB	0.11345	1.21MiB	99.87KiB	2.04KiB	0.0808164	False	False
http_pipelines_blackhole	16.22KiB	0.97	100.00%	1.62MiB	106.62KiB	2.18KiB	0.0640698	1.64MiB	144.67KiB	2.95KiB	0.0860995	False	False
socket_to_socket_blackhole	31.99KiB	0.14	72.53%	22.6MiB	995.23KiB	20.32KiB	0.0429963	22.63MiB	1.01MiB	21.09KiB	0.044557	False	False
splunk_hec_to_splunk_hec_logs_acks	19.3KiB	0.08	57.46%	23.75MiB	880.16KiB	17.9KiB	0.0361888	23.77MiB	801.26KiB	16.31KiB	0.0329186	False	False
splunk_hec_to_splunk_hec_logs_noack	9.13KiB	0.04	59.95%	23.83MiB	416.36KiB	8.51KiB	0.017058	23.84MiB	330.17KiB	6.74KiB	0.0135217	False	False
enterprise_http_to_http	-1.39KiB	-0.01	15.28%	23.85MiB	247.79KiB	5.06KiB	0.0101454	23.85MiB	250.84KiB	5.13KiB	0.0102708	False	False
splunk_hec_indexer_ack_blackhole	-1.66KiB	-0.01	5.14%	23.75MiB	883.93KiB	17.98KiB	0.0363378	23.75MiB	906.22KiB	18.43KiB	0.0372565	False	False
file_to_blackhole	-54.77KiB	-0.06	43.16%	95.34MiB	3.03MiB	62.74KiB	0.031738	95.29MiB	3.49MiB	72.67KiB	0.0366625	False	False
http_to_http_json	-26.36KiB	-0.11	97.46%	23.85MiB	333.92KiB	6.82KiB	0.0136714	23.82MiB	470.3KiB	9.62KiB	0.0192759	False	False
fluent_elasticsearch	-182.93KiB	-0.22	100.00%	79.47MiB	53.72KiB	1.09KiB	0.000660026	79.29MiB	1.58MiB	32.55KiB	0.0199407	False	False
datadog_agent_remap_blackhole_acks	-168.45KiB	-0.28	89.49%	58.75MiB	4.13MiB	85.94KiB	0.0702393	58.59MiB	2.8MiB	58.44KiB	0.0477071	False	False
http_to_http_acks	-64.77KiB	-0.36	21.36%	17.34MiB	8.14MiB	170.18KiB	0.469447	17.27MiB	8.04MiB	167.86KiB	0.465609	True	True
http_to_http_noack	-122.45KiB	-0.5	100.00%	23.84MiB	408.45KiB	8.35KiB	0.01673	23.72MiB	1.23MiB	25.68KiB	0.0519597	False	False
syslog_regex_logs2metric_ddmetrics	-79.45KiB	-0.62	100.00%	12.45MiB	617.85KiB	12.58KiB	0.048442	12.38MiB	549.4KiB	11.2KiB	0.0433459	False	False
splunk_hec_route_s3	-145.12KiB	-0.78	97.02%	18.21MiB	2.28MiB	47.44KiB	0.125066	18.07MiB	2.25MiB	46.99KiB	0.124455	False	False
syslog_loki	-116.12KiB	-0.8	100.00%	14.14MiB	627.25KiB	12.83KiB	0.043312	14.03MiB	822.42KiB	16.72KiB	0.0572474	False	False
syslog_splunk_hec_logs	-136.85KiB	-0.83	100.00%	16.17MiB	744.94KiB	15.16KiB	0.0449838	16.03MiB	520.99KiB	10.64KiB	0.0317228	False	False
datadog_agent_remap_datadog_logs_acks	-823.21KiB	-1.29	100.00%	62.42MiB	3.14MiB	65.65KiB	0.0503053	61.61MiB	4.37MiB	90.97KiB	0.0709139	False	False
syslog_humio_logs	-256.51KiB	-1.51	100.00%	16.6MiB	259.69KiB	5.3KiB	0.0152771	16.35MiB	259.09KiB	5.3KiB	0.0154754	False	False
http_pipelines_no_grok_blackhole	-174.14KiB	-1.56	100.00%	10.89MiB	257.58KiB	5.26KiB	0.0231027	10.72MiB	968.39KiB	19.71KiB	0.088236	False	False
datadog_agent_remap_datadog_logs	-1003.85KiB	-1.61	100.00%	60.76MiB	1.75MiB	36.67KiB	0.0287631	59.78MiB	4.31MiB	89.72KiB	0.0720615	False	False
syslog_log2metric_splunk_hec_metrics	-299.09KiB	-1.67	100.00%	17.48MiB	835.59KiB	17.02KiB	0.0466667	17.19MiB	941.02KiB	19.14KiB	0.053448	False	False
datadog_agent_remap_blackhole	-1.24MiB	-2.11	100.00%	59.05MiB	4.7MiB	97.93KiB	0.0795947	57.8MiB	3.58MiB	74.71KiB	0.0618756	False	False
http_text_to_http_json	-1.09MiB	-2.76	100.00%	39.57MiB	744.78KiB	15.2KiB	0.0183758	38.48MiB	832.95KiB	17.01KiB	0.0211342	False	False
syslog_log2metric_humio_metrics	-498.83KiB	-3.85	100.00%	12.65MiB	289.72KiB	5.91KiB	0.0223601	12.16MiB	781.76KiB	15.9KiB	0.0627512	False	False

Sep 21 '22 00:09 github-actions[bot]

Soak Test Results

Baseline: 9cf1ea9b08ed745e3872c1cc81757f6078c82419 Comparison: 8c3de2a6b624ec0fe7713567fe341a0f1158b24f Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

Fine details of change detection per experiment.

experiment	Δ mean	Δ mean %	confidence	baseline mean	baseline stdev	baseline stderr	baseline CoV	comparison mean	comparison stdev	comparison stderr	comparison CoV	erratic	declared erratic
socket_to_socket_blackhole	747.72KiB	3.43	100.00%	21.3MiB	2.01MiB	41.96KiB	0.0942174	22.03MiB	1.72MiB	35.92KiB	0.0779846	False	False
http_pipelines_blackhole_acks	11.79KiB	0.94	100.00%	1.22MiB	104.27KiB	2.12KiB	0.0833092	1.23MiB	93.67KiB	1.91KiB	0.0741388	False	False
http_to_http_acks	54.79KiB	0.31	18.26%	17.33MiB	7.98MiB	166.85KiB	0.460437	17.39MiB	8.09MiB	168.66KiB	0.465062	True	True
syslog_loki	39.51KiB	0.27	98.12%	14.46MiB	383.41KiB	7.85KiB	0.0258889	14.5MiB	731.57KiB	14.87KiB	0.049266	False	False
splunk_hec_to_splunk_hec_logs_noack	32.38KiB	0.13	98.30%	23.81MiB	574.25KiB	11.71KiB	0.0235504	23.84MiB	334.42KiB	6.83KiB	0.0136966	False	False
syslog_log2metric_humio_metrics	2.27KiB	0.02	16.58%	12.52MiB	218.43KiB	4.46KiB	0.0170287	12.53MiB	486.38KiB	9.91KiB	0.0379108	False	False
splunk_hec_indexer_ack_blackhole	-1.21KiB	-0	3.88%	23.75MiB	868.24KiB	17.66KiB	0.0356912	23.75MiB	863.39KiB	17.56KiB	0.0354933	False	False
enterprise_http_to_http	422.28B	0	4.47%	23.84MiB	254.73KiB	5.2KiB	0.0104301	23.85MiB	254.67KiB	5.21KiB	0.0104274	False	False
splunk_hec_to_splunk_hec_logs_acks	-3.72KiB	-0.02	12.06%	23.75MiB	841.97KiB	17.13KiB	0.0346083	23.75MiB	861.71KiB	17.53KiB	0.035425	False	False
file_to_blackhole	-58.69KiB	-0.06	48.23%	95.36MiB	2.83MiB	58.7KiB	0.0296878	95.3MiB	3.33MiB	69.18KiB	0.0348893	False	False
datadog_agent_remap_blackhole	-82.01KiB	-0.13	55.54%	61.22MiB	4.18MiB	87.13KiB	0.0683079	61.14MiB	3.0MiB	62.57KiB	0.0490487	False	False
http_to_http_json	-38.04KiB	-0.16	99.65%	23.84MiB	345.85KiB	7.06KiB	0.0141617	23.81MiB	535.13KiB	10.92KiB	0.0219462	False	False
fluent_elasticsearch	-206.94KiB	-0.25	100.00%	79.47MiB	52.35KiB	1.06KiB	0.000643189	79.27MiB	1.77MiB	36.49KiB	0.0223655	False	False
http_to_http_noack	-61.85KiB	-0.25	99.35%	23.83MiB	515.0KiB	10.53KiB	0.0211007	23.77MiB	987.13KiB	20.11KiB	0.0405479	False	False
http_pipelines_blackhole	-4.88KiB	-0.29	88.04%	1.64MiB	71.61KiB	1.46KiB	0.042531	1.64MiB	135.92KiB	2.77KiB	0.0809564	False	False
syslog_regex_logs2metric_ddmetrics	-71.58KiB	-0.55	100.00%	12.6MiB	631.77KiB	12.87KiB	0.0489578	12.53MiB	495.86KiB	10.11KiB	0.0386401	False	False
datadog_agent_remap_blackhole_acks	-361.51KiB	-0.57	99.92%	61.99MiB	4.14MiB	86.28KiB	0.0668152	61.63MiB	3.09MiB	64.57KiB	0.0501094	False	False
datadog_agent_remap_datadog_logs	-687.27KiB	-1.1	100.00%	61.26MiB	811.04KiB	16.59KiB	0.0129256	60.59MiB	4.08MiB	84.95KiB	0.067311	False	False
datadog_agent_remap_datadog_logs_acks	-803.9KiB	-1.26	100.00%	62.24MiB	3.43MiB	71.62KiB	0.0550889	61.45MiB	4.36MiB	90.76KiB	0.0709335	False	False
syslog_splunk_hec_logs	-210.39KiB	-1.27	100.00%	16.22MiB	952.33KiB	19.37KiB	0.0573089	16.02MiB	803.0KiB	16.38KiB	0.0489421	False	False
syslog_humio_logs	-245.07KiB	-1.42	100.00%	16.8MiB	112.3KiB	2.29KiB	0.00652474	16.57MiB	106.35KiB	2.18KiB	0.00626818	False	False
http_pipelines_no_grok_blackhole	-180.72KiB	-1.62	100.00%	10.89MiB	89.12KiB	1.82KiB	0.00798668	10.72MiB	1.06MiB	22.16KiB	0.0993002	False	False
splunk_hec_route_s3	-328.95KiB	-1.69	100.00%	18.98MiB	2.23MiB	46.41KiB	0.117373	18.66MiB	2.24MiB	46.78KiB	0.119893	False	False
syslog_log2metric_splunk_hec_metrics	-316.27KiB	-1.76	100.00%	17.52MiB	635.89KiB	12.97KiB	0.035443	17.21MiB	899.86KiB	18.3KiB	0.0510564	False	False
http_text_to_http_json	-1.13MiB	-2.87	100.00%	39.36MiB	798.55KiB	16.3KiB	0.0198089	38.23MiB	867.51KiB	17.72KiB	0.0221545	False	False

Sep 21 '22 00:09 github-actions[bot]

Soak Test Results

Baseline: 197ed5b27452aee5b51ba4db2443ca3ac1814634 Comparison: ef19c4faf1699cb058e301c57718fcf744b34422 Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

Fine details of change detection per experiment.

experiment	Δ mean	Δ mean %	confidence	baseline mean	baseline stdev	baseline stderr	baseline CoV	comparison mean	comparison stdev	comparison stderr	comparison CoV	erratic	declared erratic
socket_to_socket_blackhole	491.96KiB	2.12	100.00%	22.7MiB	118.06KiB	2.41KiB	0.00507806	23.18MiB	105.45KiB	2.15KiB	0.0044418	False	False
http_pipelines_blackhole_acks	16.74KiB	1.38	100.00%	1.19MiB	119.31KiB	2.43KiB	0.098001	1.2MiB	72.6KiB	1.48KiB	0.0588225	False	False
syslog_log2metric_splunk_hec_metrics	224.39KiB	1.32	100.00%	16.54MiB	1.06MiB	22.15KiB	0.0641589	16.76MiB	1.05MiB	21.9KiB	0.0626851	False	False
syslog_splunk_hec_logs	54.01KiB	0.33	98.12%	15.77MiB	853.49KiB	17.36KiB	0.0528559	15.82MiB	738.48KiB	15.06KiB	0.0455808	False	False
splunk_hec_to_splunk_hec_logs_noack	47.03KiB	0.19	99.80%	23.79MiB	666.62KiB	13.58KiB	0.027354	23.84MiB	335.37KiB	6.85KiB	0.013735	False	False
splunk_hec_indexer_ack_blackhole	15.58KiB	0.06	46.60%	23.74MiB	889.03KiB	18.08KiB	0.0365617	23.76MiB	852.14KiB	17.34KiB	0.0350223	False	False
enterprise_http_to_http	-760.44B	-0	8.00%	23.85MiB	256.17KiB	5.23KiB	0.0104887	23.85MiB	255.94KiB	5.23KiB	0.0104796	False	False
splunk_hec_to_splunk_hec_logs_acks	0B	-0	0.00%	23.74MiB	892.73KiB	18.15KiB	0.0367117	23.74MiB	900.69KiB	18.31KiB	0.0370391	False	False
syslog_humio_logs	-3.56KiB	-0.02	53.07%	16.49MiB	194.81KiB	3.98KiB	0.0115376	16.48MiB	141.54KiB	2.9KiB	0.00838435	False	False
file_to_blackhole	-55.68KiB	-0.06	46.03%	95.35MiB	2.87MiB	59.42KiB	0.0300581	95.29MiB	3.3MiB	68.64KiB	0.0346178	False	False
http_pipelines_blackhole	-1.21KiB	-0.07	36.60%	1.68MiB	41.7KiB	872.79B	0.0242936	1.67MiB	117.12KiB	2.39KiB	0.068274	False	False
http_to_http_json	-34.31KiB	-0.14	99.38%	23.85MiB	327.0KiB	6.68KiB	0.0133884	23.81MiB	518.66KiB	10.59KiB	0.0212654	False	False
syslog_regex_logs2metric_ddmetrics	-28.72KiB	-0.23	97.22%	12.01MiB	453.97KiB	9.25KiB	0.0369163	11.98MiB	451.63KiB	9.2KiB	0.0368117	False	False
http_to_http_noack	-62.05KiB	-0.25	99.40%	23.83MiB	514.2KiB	10.5KiB	0.021068	23.77MiB	981.09KiB	19.99KiB	0.0403001	False	False
http_to_http_acks	-46.25KiB	-0.26	15.65%	17.3MiB	8.0MiB	167.38KiB	0.462494	17.26MiB	7.85MiB	163.89KiB	0.454519	True	True
fluent_elasticsearch	-215.31KiB	-0.26	100.00%	79.47MiB	54.25KiB	1.1KiB	0.000666549	79.26MiB	2.25MiB	46.2KiB	0.0283342	False	False
datadog_agent_remap_blackhole_acks	-189.65KiB	-0.31	91.55%	59.91MiB	4.42MiB	92.01KiB	0.0737631	59.73MiB	2.88MiB	60.13KiB	0.04813	False	False
syslog_loki	-144.53KiB	-0.99	100.00%	14.3MiB	485.42KiB	9.95KiB	0.0331493	14.16MiB	863.07KiB	17.54KiB	0.0595269	False	False
datadog_agent_remap_datadog_logs_acks	-742.73KiB	-1.18	100.00%	61.31MiB	3.09MiB	64.52KiB	0.050335	60.59MiB	4.29MiB	89.24KiB	0.0707459	False	False
splunk_hec_route_s3	-301.18KiB	-1.57	100.00%	18.73MiB	2.27MiB	47.25KiB	0.121192	18.43MiB	2.22MiB	46.53KiB	0.12064	False	False
http_pipelines_no_grok_blackhole	-276.94KiB	-2.57	100.00%	10.54MiB	596.16KiB	12.17KiB	0.0552289	10.27MiB	1.17MiB	24.39KiB	0.114079	False	False
http_text_to_http_json	-1.07MiB	-2.9	100.00%	36.87MiB	2.51MiB	52.37KiB	0.0679495	35.8MiB	2.72MiB	56.72KiB	0.07583	False	False
datadog_agent_remap_datadog_logs	-1.99MiB	-3.4	100.00%	58.48MiB	4.6MiB	96.54KiB	0.0786976	56.49MiB	6.22MiB	129.5KiB	0.110058	False	False
syslog_log2metric_humio_metrics	-441.36KiB	-3.6	100.00%	11.98MiB	903.2KiB	18.44KiB	0.0736128	11.55MiB	958.02KiB	19.51KiB	0.0809946	False	False
datadog_agent_remap_blackhole	-2.67MiB	-4.43	100.00%	60.2MiB	4.23MiB	88.26KiB	0.0703133	57.54MiB	3.89MiB	81.18KiB	0.0676018	False	False

Sep 21 '22 17:09 github-actions[bot]

@ktff today I had to help fixing some parsing errors in our regex-based CEF processing pipeline. I couldn't help myself but thinking that this PR will improve our lives dramatically. This feed in particular is ~40K EPS of CEF data (PaloAlto devices), and I will definitively test your parse_cef() implementation with it.

I noticed two particularities in the CEF data today, that I wanted to bring up for you to consider (if not already). Unfortunately I don't have the means currently to build/test your PR myself.

The first case is of CEF extension fields with empty values. For example app= msg= act=. The second case is more funky: CEF extension fields with quoted values: msg="Some message.".

I wonder if for the first case, your implementation will return empty-valued keys or would discard them entirely? And also, for the second case, if parse_cef() will strip the quotes from the value? That woud be really nice.

Sep 21 '22 18:09 hhromic

@ktff today I had to help fixing some parsing errors in our regex-based CEF processing pipeline. I couldn't help myself but thinking that this PR will improve our lives dramatically. This feed in particular is ~40K EPS of CEF data (PaloAlto devices), and I will definitively test your parse_cef() implementation with it.

I noticed two particularities in the CEF data today, that I wanted to bring up for you to consider (if not already). Unfortunately I don't have the means currently to build/test your PR myself.

The first case is of CEF extension fields with empty values. For example app= msg= act=. The second case is more funky: CEF extension fields with quoted values: msg="Some message.".

I wonder if for the first case, your implementation will return empty-valued keys or would discard them entirely? And also, for the second case, if parse_cef() will strip the quotes from the value? That woud be really nice.

Example with empty extension values (fails to parse)

$ parse_cef!("Sep 29 08:26:10 host CEF:1|Security|threatmanager|1.0|100|worm successfully stopped|10|src=10.0.0.1 dst= spt=")
function call error for "parse_cef" at (0:123): Could not parse whole line successfully

Example with quoted extension value (quotes are kept)

$ parse_cef!("Sep 29 08:26:10 host CEF:1|Security|threatmanager|1.0|100|worm successfully stopped|10|src=10.0.0.1 dst=\"2.1.2.2\" spt=\"1232\"")
{ "cefVersion": "1", "deviceEventClassId": "100", "deviceProduct": "threatmanager", "deviceVendor": "Security", "deviceVersion": "1.0", "dst": "\"2.1.2.2\"", "name": "worm successfully stopped", "severity": "10", "spt": "\"1232\"", "src": "10.0.0.1" }

Another edge case I noticed. Empty extensions at the end seems to work fine, but empty extensions not at the end (example 1 above) fail.

$ parse_cef!("Sep 29 08:26:10 host CEF:1|Security|threatmanager|1.0|100|worm successfully stopped|10|src=10.0.0.1 spt=")
{ "cefVersion": "1", "deviceEventClassId": "100", "deviceProduct": "threatmanager", "deviceVendor": "Security", "deviceVersion": "1.0", "name": "worm successfully stopped", "severity": "10", "spt": "", "src": "10.0.0.1" }

However, if you have an "empty" extension with multiple spaces, it parses again (capturing the spaces)

$ parse_cef!("Sep 29 08:26:10 host CEF:1|Security|threatmanager|1.0|100|worm successfully stopped|10|src=  spt=")
{ "cefVersion": "1", "deviceEventClassId": "100", "deviceProduct": "threatmanager", "deviceVendor": "Security", "deviceVersion": "1.0", "name": "worm successfully stopped", "severity": "10", "spt": "", "src": " " }

I think supporting 1 above is reasonable, even though it is not mentioned in the spec. It should close up the edge cases mentioned above too.
I'm a bit on the fence for quoted values, since it's not mentioned in the spec, and a user could conceivably prefer to have the quotes, but I'm open to discussion, or potentially adding this as an option (potentially defaulted on).

Sep 21 '22 18:09 fuchsnj

vector
vector copied to clipboard

feat(vrl): Add `parse_cef` function

Open questions

Deploy Preview for vector-project canceled.

Soak Test Results

Soak Test Results

Soak Test Results

Soak Test Results

Soak Test Results

Soak Test Results

Soak Test Results

Soak Test Results

vector vector copied to clipboard

feat(vrl): Add `parse_cef` function

Open questions

✅ Deploy Preview for vector-project canceled.

Soak Test Results

Soak Test Results

Soak Test Results

Soak Test Results

Soak Test Results

Soak Test Results

Soak Test Results

Soak Test Results

vector
vector copied to clipboard

Deploy Preview for vector-project canceled.