vector icon indicating copy to clipboard operation
vector copied to clipboard

feat(vrl): Add `parse_cef` function

Open ktff opened this issue 3 years ago • 32 comments

Closes #6451.

Follows CEF version 26.

Open questions

There are two extensions for this parser that I'm unsure if they should be included now, later, or at all:

  • [x] Translate CEF Key Names to Full Name. Example: "act" to "deviceAction", (EDIT: Not at all.)
  • [x] Construct key-value from key label fields. Example: (EDIT: A separate issue. )
{
"c6a1": "value1",
"c6a1Label": "key1"
}

to

{
"key1": "value1",
}

ktff avatar Sep 12 '22 23:09 ktff

Deploy Preview for vector-project canceled.

Name Link
Latest commit 4b4e2086a2d74969faed12b0e70cd11361963eee
Latest deploy log https://app.netlify.com/sites/vector-project/deploys/63348648f2b2c10009800c9b

netlify[bot] avatar Sep 12 '22 23:09 netlify[bot]

Soak Test Results

Baseline: 28113af2bf357c71957bf377225dc9df2415cf8f Comparison: 4886c19003a72e7af86fe30dbca89a4f59918dc6 Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

Fine details of change detection per experiment.
experiment Δ mean Δ mean % confidence baseline mean baseline stdev baseline stderr baseline outlier % baseline CoV comparison mean comparison stdev comparison stderr comparison outlier % comparison CoV erratic declared erratic
datadog_agent_remap_blackhole_acks 2.13MiB 3.71 100.00% 57.38MiB 4.41MiB 91.72KiB 0 0.076762 59.5MiB 3.46MiB 72.52KiB 0 0.0581851 False False
datadog_agent_remap_blackhole 1.25MiB 2.22 100.00% 56.25MiB 3.76MiB 78.25KiB 0 0.0667581 57.5MiB 3.94MiB 82.06KiB 0 0.0684304 False False
http_text_to_http_json 699.3KiB 1.82 100.00% 37.53MiB 900.93KiB 18.39KiB 0 0.0234363 38.22MiB 852.95KiB 17.41KiB 0 0.0217917 False False
splunk_hec_route_s3 260.84KiB 1.38 99.99% 18.46MiB 2.37MiB 49.27KiB 0 0.128177 18.72MiB 2.17MiB 45.32KiB 0 0.115719 False False
http_pipelines_blackhole_acks 14.44KiB 1.14 100.00% 1.23MiB 112.01KiB 2.28KiB 0 0.0886288 1.25MiB 80.21KiB 1.63KiB 0 0.0627486 False False
syslog_humio_logs 123.45KiB 0.74 100.00% 16.3MiB 179.6KiB 3.67KiB 0 0.0107565 16.42MiB 155.79KiB 3.19KiB 0 0.00926216 False False
datadog_agent_remap_datadog_logs_acks 444.2KiB 0.74 99.99% 58.97MiB 3.24MiB 67.66KiB 0 0.0548727 59.41MiB 4.44MiB 92.42KiB 0 0.074717 False False
http_to_http_acks 119.39KiB 0.67 37.96% 17.37MiB 8.06MiB 168.48KiB 0 0.463991 17.49MiB 8.26MiB 172.41KiB 0 0.472311 True True
socket_to_socket_blackhole 154.01KiB 0.66 100.00% 22.67MiB 409.17KiB 8.35KiB 0 0.0176232 22.82MiB 178.5KiB 3.64KiB 0 0.00763762 False False
syslog_regex_logs2metric_ddmetrics 51.0KiB 0.41 99.65% 12.28MiB 651.84KiB 13.28KiB 0 0.0518225 12.33MiB 556.77KiB 11.35KiB 0 0.0440856 False False
syslog_log2metric_humio_metrics 51.53KiB 0.4 99.98% 12.5MiB 281.42KiB 5.74KiB 0 0.0219787 12.55MiB 606.7KiB 12.35KiB 0 0.0471939 False False
syslog_splunk_hec_logs 16.58KiB 0.1 63.62% 16.34MiB 739.47KiB 15.05KiB 0 0.044195 16.35MiB 506.32KiB 10.34KiB 0 0.030231 False False
splunk_hec_to_splunk_hec_logs_noack 9.65KiB 0.04 61.28% 23.83MiB 435.79KiB 8.9KiB 0 0.0178572 23.84MiB 330.24KiB 6.74KiB 0 0.0135269 False False
splunk_hec_indexer_ack_blackhole 4.48KiB 0.02 13.32% 23.74MiB 930.31KiB 18.92KiB 0 0.0382638 23.74MiB 924.64KiB 18.8KiB 0 0.0380238 False False
enterprise_http_to_http 717.05B 0 7.61% 23.84MiB 253.34KiB 5.17KiB 0 0.0103733 23.85MiB 253.94KiB 5.2KiB 0 0.0103977 False False
splunk_hec_to_splunk_hec_logs_acks -2.46KiB -0.01 7.65% 23.75MiB 889.03KiB 18.08KiB 0 0.0365475 23.75MiB 892.55KiB 18.15KiB 0 0.0366962 False False
file_to_blackhole -17.27KiB -0.02 11.71% 95.33MiB 3.91MiB 80.97KiB 0 0.0409638 95.31MiB 4.08MiB 84.82KiB 0 0.0427583 False False
http_to_http_json -30.03KiB -0.12 98.76% 23.85MiB 333.03KiB 6.8KiB 0 0.0136332 23.82MiB 484.08KiB 9.89KiB 0 0.0198411 False False
datadog_agent_remap_datadog_logs -107.25KiB -0.17 71.68% 60.38MiB 1.92MiB 40.16KiB 0 0.0317461 60.28MiB 4.39MiB 91.5KiB 0 0.0728925 False False
http_pipelines_no_grok_blackhole -22.35KiB -0.21 68.81% 10.62MiB 139.1KiB 2.84KiB 0 0.0127897 10.6MiB 1.05MiB 21.92KiB 0 0.0993108 False False
fluent_elasticsearch -170.99KiB -0.21 100.00% 79.47MiB 53.0KiB 1.07KiB 0 0.000651135 79.31MiB 1.56MiB 32.08KiB 0 0.019653 False False
http_to_http_noack -77.3KiB -0.32 99.98% 23.85MiB 254.72KiB 5.21KiB 0 0.0104294 23.77MiB 982.23KiB 20.01KiB 0 0.0403448 False False
http_pipelines_blackhole -11.65KiB -0.66 100.00% 1.73MiB 10.96KiB 229.29B 0 0.0061985 1.72MiB 122.81KiB 2.5KiB 0 0.0699126 False False
syslog_loki -114.94KiB -0.76 100.00% 14.71MiB 379.75KiB 7.77KiB 0 0.0252129 14.59MiB 738.83KiB 15.02KiB 0 0.0494313 False False
syslog_log2metric_splunk_hec_metrics -252.8KiB -1.47 100.00% 16.82MiB 890.69KiB 18.16KiB 0 0.0517027 16.57MiB 1.04MiB 21.68KiB 0 0.0628038 False False

github-actions[bot] avatar Sep 13 '22 00:09 github-actions[bot]

Translate CEF Key Names to Full Name. Example: "act" to "deviceAction"

I would not do that. *act", in this example, is the short name as defined in the CEF docs which is ok to work with. Changing all short names to full names (as per the docs) has little value IMHO. Chances are that you will rename the keys anyways.

Construct key-value from key label fields.

This could be interesting and help working with CEF logs. I would make it optional though as one might not need it.

sim0nx avatar Sep 13 '22 07:09 sim0nx

I'm looking at the CEF v26 specification from here: https://www.microfocus.com/documentation/arcsight/arcsight-smartconnectors-8.3/pdfdoc/cef-implementation-standard/cef-implementation-standard.pdf

I'm not very familiar with this format, but the documentation seems to imply the most common format is with a syslog prefix, which the current implementation doesn't support. It would be good to support that, or explain why it's not needed.

For example, I expected the following example to parse correctly

Sep 29 08:26:10 host CEF:1|Security|threatmanager|1.0|100|worm successfully
stopped|10|src=10.0.0.1 dst=2.1.2.2 spt=1232

fuchsnj avatar Sep 13 '22 13:09 fuchsnj

I'm looking at the CEF v26 specification from here: https://www.microfocus.com/documentation/arcsight/arcsight-smartconnectors-8.3/pdfdoc/cef-implementation-standard/cef-implementation-standard.pdf

I'm not very familiar with this format, but the documentation seems to imply the most common format is with a syslog prefix, which the current implementation doesn't support. It would be good to support that, or explain why it's not needed.

For example, I expected the following example to parse correctly

Sep 29 08:26:10 host CEF:1|Security|threatmanager|1.0|100|worm successfully
stopped|10|src=10.0.0.1 dst=2.1.2.2 spt=1232

@fuchsnj shouldn't one use the syslog source (or parse_syslog) in that case to parse the syslog part of the message and use parse_cef for the CEF part ?

sim0nx avatar Sep 13 '22 15:09 sim0nx

@fuchsnj shouldn't one use the syslog source (or parse_syslog) in that case to parse the syslog part of the message and use parse_cef for the CEF part ?

It's just a syslog prefix as part of the CEF header. It's not a full syslog message (it won't parse correctly with parse_syslog, and even if it did, you wouldn't get the remainder to pass to parse_cef). I think it would be acceptable to skip everything before the CEF:Version header.

fuchsnj avatar Sep 13 '22 16:09 fuchsnj

It's not a full syslog message (it won't parse correctly with parse_syslog, and even if it did, you wouldn't get the remainder to pass to parse_cef).

That is an issue. Then, we could try to parse syslog prefix else, if we fail, discard everything up to CEF header.

ktff avatar Sep 13 '22 16:09 ktff

@fuchsnj shouldn't one use the syslog source (or parse_syslog) in that case to parse the syslog part of the message and use parse_cef for the CEF part ?

It's just a syslog prefix as part of the CEF header. It's not a full syslog message (it won't parse correctly with parse_syslog, and even if it did, you wouldn't get the remainder to pass to parse_cef). I think it would be acceptable to skip everything before the CEF:Version header.

I am working with CEF log files... what I am seeing is two variants:

  • no syslog parts, basically only CEF over
  • normal syslog line with as message the CEF part

parse_syslog, and even if it did, you wouldn't get the remainder to pass to parse_cef

according to https://vector.dev/docs/reference/vrl/functions/#parse_syslog, you would get a message, which is the original line minus the syslog part... so you would pass the message to parse_cef, at least that's what I had in mind ... CMIIAW

I think it would be acceptable to skip everything before the CEF:Version header.

I would opt for that ... should I care for the "syslog prefix" as well, I could parse it somehow else (e.g. grok etc) and use the CEF part to pass to parse_cef ... or just have parse_cef ignore everything before CEF:Version as you say.

my 2c

sim0nx avatar Sep 13 '22 18:09 sim0nx

First of all, thank you very much for implementing/working on this function. 🎉 A number of our pipelines require to parse CEF data and so far we are just using VRL for this, creating boilerplate. We had in our notes to actually request this feature so this PR and initiative is highly appreciated.

Similar to @sim0nx , we also have to deal with different formats for "CEF-over-Syslog" data. Examples:

May 31 10:01:02 hostname CEF:0|Zscaler|NSSWeblog|5.0|Allowed|...
<13>Jun  4 10:29:58 hostname CEF:0|Palo Alto Networks|PAN-OS|...

In our case, they differ in the inclusion of the syslog PRIO header.

For both cases, in VRL, we first apply parse_syslog() to obtain the message part and then the following parse-cef transform that we made ourselves for the purpose:

host = string!(.host)
message = string!(.message)
fields = split(message, "|", limit: 8)
if length(fields) < 8 {
  log("invalid CEF message from <" + host + ">: " + message, level: "warn")
  abort
}
. = {
  "host": host,
  "version": to_int!(fields[0]),
  "device_vendor": fields[1],
  "device_product": fields[2],
  "device_version": fields[3],
  "device_event_class_id": fields[4],
  "name": fields[5],
  "severity": fields[6],
  "extension": fields[7],
}

The above is just parsing the CEF format itself strictly. To parse the rest, we made a second-stage parse-XYZ transform for the extension field itself. As you can imagine, there are huge regexes, boilerplate and duplicated code across lot of our pipelines.

For example:

host = string!(.host)
extension = string!(.extension)
., err = parse_regex(extension, r'^act=(?P<eventAction>[\w\d\s./_-]+) ')  # can't share the full regex :(
if err != null {
  log("failed to parse zscaler CEF extension from <" + host + ">: " + extension, level: "warn")
  abort
}

The example you show in Construct key-value from key label fields looks VERY valuable and useful to us :)

Hope the above gives some light on real-world use-cases for your proposed parse_cef() function.

hhromic avatar Sep 14 '22 09:09 hhromic

Thanks @sim0nx @hhromic.

Then we can add two following modifications:

  • When parsing discard everything up to CEF:Version. -- As @sim0nx said, with this parsing will just work regardless if it's embedded in syslog or not. And if that prefix is useful it can be parsed out with syslog parser. The parser would just work as @fuchsnj expected.
  • Construct key-value from key label fields. -- This seems to be useful and it the domain of this parser. Also I wouldn't make it optional since this mechanism is defined in the specification as a way to have custom keys in CEF, to avoid its limitations. And since the output is no longer CEF the limitation is no longer present, so there is no reason for that anymore. At least until a real world case is presented.

ktff avatar Sep 14 '22 23:09 ktff

@ktff sounds good to me!

  • Construct key-value from key label fields. -- This seems to be useful and it the domain of this parser. Also I wouldn't make it optional since this mechanism is defined in the specification as a way to have custom keys in CEF, to avoid its limitations. And since the output is no longer CEF the limitation is no longer present, so there is no reason for that anymore. At least until a real world case is presented.

I agree with that too. We already feel very excited to get this type of transformation "for free" in this function:

{"c6a1":"value1","c6a1Label":"key1"} -> {"key1":"value1"}

While it should be rare, what would happen if multiple X + XLabel pairs share the same key? It would be constructing an array in that case? For example:

{"c6a1":"value1","c6a1Label":"key1","c5a1":"value2","c5a1Label":"key1"} -> {"key1":["value1","value2"]}

I think other functions in VRL already do like the above. So it would be nice for consistency.

hhromic avatar Sep 15 '22 14:09 hhromic

@hhromic that seems like really rare for CEF and a bit hacky way to transmit an array, so I'm not sure if it should be supported. While just silently dropping the field isn't ideal either. So instead of that, on collision we can not perform transformation for that pair.

ktff avatar Sep 15 '22 22:09 ktff

@ktff yes, definitively CEF (on paper) should not have duplicated XLabel values for different X. And definitively I don't think CEF would ever intend to transmit arrays either in that way "natively". I say "on paper" because our team has seen a vast amount of CEF data from different devices/software during years and I'm not sure you can imagine the ugly formatting horrors that appear on the wild. That's why I was asking how the proposed parse_cef() function would handle such a rare but not impossible situation.

So instead of that, on collision we can not perform transformation for that pair.

I think that would cause more headaches than solutions in the long run. If data ever happens to come with duplicated labels, I think is easier to handle in an array (after conversion) than leaving "as-is". Especially because it would be harder to detect "as-is".

Just in case it was not clear, the "to-array" logic would only use an array output if-and-only-if there are duplicate keys found during parsing. Otherwise the field value remains a simple string. Eg:

{"c6a1":"value1","c6a1Label":"key1","c5a1":"value2","c5a1Label":"key1"} -> {"key1":["value1","value2"]}
{"c6a1":"value1","c6a1Label":"key1","c5a1":"value2","c5a1Label":"key2"} -> {"key1":"value1","key2":"value2"}

In this way, in the vast majority of cases there will never be arrays in the values, except if a device decides to duplicate labels.

I can't remember right now which other function in VRL behaves like this, but tomorrow I will check the docs and report back.

hhromic avatar Sep 15 '22 23:09 hhromic

@ktff here are at least two other functions in VRL that behave as described above:

$ parse_key_value!("key1=value1")
{ "key1": "value1" }

$ parse_key_value!("key1=value1 key1=value2")
{ "key1": ["value1", "value2"] }

$ parse_query_string("?key1=value1")
{ "key1": "value1" }

$ parse_query_string("?key1=value1&key1=value2")
{ "key1": ["value1", "value2"] }

hhromic avatar Sep 16 '22 15:09 hhromic

@ktff here are at least two other functions in VRL that behave as described above:

$ parse_key_value!("key1=value1")
{ "key1": "value1" }

$ parse_key_value!("key1=value1 key1=value2")
{ "key1": ["value1", "value2"] }

$ parse_query_string("?key1=value1")
{ "key1": "value1" }

$ parse_query_string("?key1=value1&key1=value2")
{ "key1": ["value1", "value2"] }

I think this would be a reasonable behavior to continue following - unless the spec is 100% that duplicate keys will never happen. We've also stuck pretty close to "implemented per spec, regardless of in-the-wild behavior doesn't always match that"

spencergilbert avatar Sep 16 '22 16:09 spencergilbert

While it should be rare, what would happen if multiple X + XLabel pairs share the same key? It would be constructing an array in that case?

One downside to keep in mind, if duplicates are placed in arrays that means the VRL type definitions will always be string | array which means functions using the output of parse_cef that expect only strings as input will be fallible, or require coercing to a string first. So if there isn't a good reason to allow duplicates, it will be easier to work with if only 1 value is kept for each key.

fuchsnj avatar Sep 16 '22 16:09 fuchsnj

We've also stuck pretty close to "implemented per spec, regardless of in-the-wild behavior doesn't always match that"

I also like to follow specs and standards as much as possible, but it is a shame that vendors sometimes don't :(

We like Vector a lot because it easily allows us to deal with bad data. The syslog parser in Vector is a good example of a very resilient parser that does not drop bad data after trying its best to parse malformed syslog.

One downside to keep in mind, if duplicates are placed in arrays that means the VRL type definitions will always be string | array which means functions using the output of parse_cef that expect only strings as input will be fallible, or require coercing to a string first. So if there isn't a good reason to allow duplicates, it will be easier to work with if only 1 value is kept for each key.

That is a very interesting point indeed. I just checked and indeed parse_key_value() and parse_query_string() suffer the same caveat already.

I guess in the end it would be okay for parse_cef() to only process single-valued keys, as long as it doesn't discard incoming data because of this. Perhaps, the way of handling duplicate keys could be configured, pretty much like how it was discussed in this (still pending) PR: https://github.com/vectordotdev/vector/pull/11580#issuecomment-1055018531

hhromic avatar Sep 16 '22 17:09 hhromic

We've also stuck pretty close to "implemented per spec, regardless of in-the-wild behavior doesn't always match that"

I also like to follow specs and standards as much as possible, but it is a shame that vendors sometimes don't :(

We like Vector a lot because it easily allows us to deal with bad data. The syslog parser in Vector is a good example of a very resilient parser that does not drop bad data after trying its best to parse malformed syslog.

Ah, I guess this wasn't fully thought out when I typed this. That's generally been the case on the inputs/outputs for Vector, to avoid bad/malformed data coming into your pipeline. Definitely want to keep the VRL handling flexible and robust.

Example being the syslog source rejects invalid messages when it fails to decode, and the case of "pseudo-syslog" - you can use the socket source and handle the decoding in VRL.

spencergilbert avatar Sep 16 '22 17:09 spencergilbert

@hhromic parse_key_value and parse_query_string are different beasts, they don't have any key set so they need to be generic as possible. While, as @fuchsnj mentioned, when parsing valid CEF we can always return a string value since there are no duplicates, which is nice. But I do agree that

So instead of that, on collision we can not perform transformation for that pair.

isn't a good solution.

So let's after all make this translation a separate feature. It seems like it will require an option so that it's opt in. Once active it can change the type definition of the parser so that it returns string and array. Or at least that seems like a way to do it.

ktff avatar Sep 16 '22 23:09 ktff

Soak Test Results

Baseline: d498040a770ae2bb5c9d25efce62acadcb17ee57 Comparison: 326838a0fdea56253010e431862b47455eb17d5c Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

Fine details of change detection per experiment.
experiment Δ mean Δ mean % confidence baseline mean baseline stdev baseline stderr baseline outlier % baseline CoV comparison mean comparison stdev comparison stderr comparison outlier % comparison CoV erratic declared erratic
http_pipelines_blackhole 46.63KiB 2.86 100.00% 1.59MiB 88.55KiB 1.81KiB 0 0.054326 1.64MiB 138.62KiB 2.82KiB 0 0.0826826 False False
syslog_loki 209.87KiB 1.43 100.00% 14.33MiB 244.61KiB 5.01KiB 0 0.0166715 14.53MiB 747.16KiB 15.19KiB 0 0.0502059 False False
datadog_agent_remap_blackhole_acks 263.01KiB 0.42 99.10% 60.95MiB 4.17MiB 86.78KiB 0 0.0683613 61.2MiB 2.44MiB 51.08KiB 0 0.0398841 False False
datadog_agent_remap_blackhole 68.12KiB 0.11 42.96% 58.06MiB 4.55MiB 94.8KiB 0 0.0784111 58.12MiB 3.53MiB 73.63KiB 0 0.0606835 False False
splunk_hec_to_splunk_hec_logs_noack 8.45KiB 0.03 55.30% 23.83MiB 428.47KiB 8.74KiB 0 0.0175542 23.84MiB 335.31KiB 6.84KiB 0 0.013733 False False
enterprise_http_to_http -1.97KiB -0.01 21.31% 23.85MiB 251.01KiB 5.12KiB 0 0.0102769 23.85MiB 253.94KiB 5.2KiB 0 0.0103978 False False
splunk_hec_to_splunk_hec_logs_acks -16.88KiB -0.07 57.65% 23.79MiB 702.39KiB 14.31KiB 0 0.0288281 23.77MiB 760.8KiB 15.49KiB 0 0.031247 False False
splunk_hec_indexer_ack_blackhole -19.7KiB -0.08 57.35% 23.77MiB 808.03KiB 16.45KiB 0 0.0331957 23.75MiB 910.89KiB 18.53KiB 0 0.037452 False False
file_to_blackhole -73.76KiB -0.08 47.33% 95.34MiB 3.48MiB 72.22KiB 0 0.0365323 95.27MiB 4.4MiB 91.43KiB 0 0.0461631 False False
splunk_hec_route_s3 -14.96KiB -0.08 16.99% 18.13MiB 2.39MiB 49.72KiB 0 0.131638 18.12MiB 2.34MiB 48.83KiB 0 0.128866 False False
http_to_http_json -24.89KiB -0.1 95.81% 23.84MiB 356.93KiB 7.29KiB 0 0.0146172 23.82MiB 480.54KiB 9.82KiB 0 0.0196993 False False
http_to_http_noack -61.07KiB -0.25 99.65% 23.84MiB 407.17KiB 8.32KiB 0 0.0166772 23.78MiB 940.55KiB 19.17KiB 0 0.0386208 False False
syslog_log2metric_humio_metrics -37.47KiB -0.3 99.94% 12.21MiB 199.01KiB 4.06KiB 0 0.0159097 12.18MiB 496.26KiB 10.1KiB 0 0.0397918 False False
fluent_elasticsearch -383.53KiB -0.47 100.00% 79.47MiB 55.64KiB 1.12KiB 0 0.000683536 79.1MiB 4.1MiB 84.24KiB 0 0.0518146 False False
http_text_to_http_json -212.58KiB -0.54 100.00% 38.34MiB 874.56KiB 17.85KiB 0 0.0222714 38.13MiB 869.93KiB 17.76KiB 0 0.0222742 False False
datadog_agent_remap_datadog_logs_acks -572.38KiB -0.92 100.00% 60.92MiB 3.21MiB 67.09KiB 0 0.0526934 60.36MiB 4.28MiB 89.14KiB 0 0.0709264 False False
datadog_agent_remap_datadog_logs -704.88KiB -1.11 100.00% 62.28MiB 639.3KiB 13.1KiB 0 0.010023 61.59MiB 4.26MiB 88.72KiB 0 0.0691786 False False
syslog_regex_logs2metric_ddmetrics -172.63KiB -1.34 100.00% 12.54MiB 508.1KiB 10.36KiB 0 0.0395612 12.37MiB 440.41KiB 8.98KiB 0 0.034758 False False
syslog_splunk_hec_logs -229.67KiB -1.38 100.00% 16.22MiB 675.26KiB 13.76KiB 0 0.0406376 16.0MiB 755.93KiB 15.39KiB 0 0.0461301 False False
http_pipelines_blackhole_acks -17.85KiB -1.43 100.00% 1.22MiB 110.64KiB 2.25KiB 0 0.0887391 1.2MiB 84.53KiB 1.72KiB 0 0.0687789 False False
http_to_http_acks -292.92KiB -1.64 77.33% 17.48MiB 8.19MiB 171.23KiB 0 0.46865 17.19MiB 8.21MiB 171.4KiB 0 0.477627 True True
syslog_humio_logs -431.23KiB -2.53 100.00% 16.62MiB 224.26KiB 4.58KiB 0 0.0131741 16.2MiB 232.39KiB 4.76KiB 0 0.0140064 False False
syslog_log2metric_splunk_hec_metrics -454.07KiB -2.57 100.00% 17.24MiB 825.2KiB 16.81KiB 0 0.0467251 16.8MiB 740.78KiB 15.1KiB 0 0.0430522 False False
http_pipelines_no_grok_blackhole -301.92KiB -2.77 100.00% 10.65MiB 336.27KiB 6.86KiB 0 0.0308161 10.36MiB 1.1MiB 22.97KiB 0 0.106508 False False
socket_to_socket_blackhole -677.6KiB -2.9 100.00% 22.79MiB 624.03KiB 12.74KiB 0 0.0267291 22.13MiB 534.12KiB 10.9KiB 0 0.0235622 False False

github-actions[bot] avatar Sep 17 '22 01:09 github-actions[bot]

Soak Test Results

Baseline: d498040a770ae2bb5c9d25efce62acadcb17ee57 Comparison: 7f0d88e8511dd0c98b28f71e531f2b42ef1ad275 Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

Fine details of change detection per experiment.
experiment Δ mean Δ mean % confidence baseline mean baseline stdev baseline stderr baseline outlier % baseline CoV comparison mean comparison stdev comparison stderr comparison outlier % comparison CoV erratic declared erratic
syslog_loki 127.0KiB 0.9 100.00% 13.77MiB 411.87KiB 8.42KiB 0 0.0292098 13.89MiB 714.19KiB 14.52KiB 0 0.0501988 False False
datadog_agent_remap_blackhole_acks 405.99KiB 0.66 99.93% 59.83MiB 4.78MiB 99.5KiB 0 0.0798753 60.23MiB 3.22MiB 67.26KiB 0 0.0534066 False False
datadog_agent_remap_blackhole 333.39KiB 0.53 99.94% 60.86MiB 3.93MiB 81.81KiB 0 0.0645111 61.19MiB 2.46MiB 51.39KiB 0 0.0402224 False False
http_pipelines_blackhole_acks 4.17KiB 0.34 86.66% 1.19MiB 115.36KiB 2.35KiB 0 0.0944111 1.2MiB 73.0KiB 1.49KiB 0 0.0595414 False False
http_pipelines_blackhole 2.72KiB 0.16 71.61% 1.69MiB 43.22KiB 903.84B 0 0.0249883 1.69MiB 116.89KiB 2.38KiB 0 0.0674695 False False
splunk_hec_to_splunk_hec_logs_noack 15.73KiB 0.06 81.76% 23.82MiB 470.56KiB 9.61KiB 0 0.0192869 23.84MiB 335.41KiB 6.85KiB 0 0.0137387 False False
splunk_hec_to_splunk_hec_logs_acks 15.34KiB 0.06 46.56% 23.75MiB 881.87KiB 17.93KiB 0 0.0362608 23.76MiB 834.05KiB 16.97KiB 0 0.0342727 False False
splunk_hec_indexer_ack_blackhole 7.86KiB 0.03 23.11% 23.74MiB 950.54KiB 19.33KiB 0 0.0390981 23.74MiB 910.13KiB 18.51KiB 0 0.0374239 False False
enterprise_http_to_http -1.19KiB -0 13.13% 23.85MiB 248.57KiB 5.07KiB 0 0.0101775 23.85MiB 249.36KiB 5.1KiB 0 0.0102102 False False
file_to_blackhole -62.21KiB -0.06 66.63% 95.38MiB 2.05MiB 42.42KiB 0 0.0214527 95.32MiB 2.33MiB 48.37KiB 0 0.0243906 False False
http_to_http_json -36.33KiB -0.15 99.45% 23.84MiB 345.93KiB 7.06KiB 0 0.0141648 23.81MiB 538.78KiB 11.0KiB 0 0.0220942 False False
fluent_elasticsearch -219.59KiB -0.27 100.00% 79.47MiB 53.43KiB 1.08KiB 0 0.000656466 79.26MiB 2.55MiB 52.45KiB 0 0.0321784 False False
http_to_http_noack -96.23KiB -0.39 99.98% 23.83MiB 519.19KiB 10.61KiB 0 0.0212732 23.73MiB 1.15MiB 23.96KiB 0 0.0484144 False False
syslog_log2metric_humio_metrics -61.43KiB -0.47 100.00% 12.71MiB 221.81KiB 4.53KiB 0 0.0170334 12.65MiB 543.91KiB 11.07KiB 0 0.0419655 False False
datadog_agent_remap_datadog_logs -506.62KiB -0.81 100.00% 61.17MiB 279.83KiB 5.73KiB 0 0.00446658 60.67MiB 3.95MiB 82.36KiB 0 0.0651559 False False
syslog_regex_logs2metric_ddmetrics -131.53KiB -1.03 100.00% 12.52MiB 635.63KiB 12.95KiB 0 0.0495824 12.39MiB 509.03KiB 10.38KiB 0 0.0401185 False False
syslog_splunk_hec_logs -211.83KiB -1.28 100.00% 16.12MiB 881.87KiB 17.95KiB 0 0.0534003 15.92MiB 865.95KiB 17.63KiB 0 0.0531176 False False
http_text_to_http_json -615.72KiB -1.57 100.00% 38.42MiB 816.69KiB 16.67KiB 0 0.0207568 37.81MiB 1.16MiB 24.15KiB 0 0.0305482 False False
splunk_hec_route_s3 -298.64KiB -1.62 100.00% 18.04MiB 2.33MiB 48.53KiB 0 0.129137 17.75MiB 2.3MiB 48.03KiB 0 0.129347 False False
http_pipelines_no_grok_blackhole -283.41KiB -2.53 100.00% 10.95MiB 63.77KiB 1.3KiB 0 0.00568599 10.67MiB 1.03MiB 21.5KiB 0 0.0967261 False False
http_to_http_acks -456.67KiB -2.53 94.34% 17.65MiB 8.11MiB 169.6KiB 0 0.459505 17.2MiB 8.1MiB 169.09KiB 0 0.470596 True True
syslog_log2metric_splunk_hec_metrics -626.03KiB -3.39 100.00% 18.02MiB 546.28KiB 11.13KiB 0 0.0295932 17.41MiB 762.43KiB 15.52KiB 0 0.0427533 False False
syslog_humio_logs -613.46KiB -3.57 100.00% 16.79MiB 133.03KiB 2.72KiB 0 0.00773777 16.19MiB 542.76KiB 11.12KiB 0 0.0327375 False False
datadog_agent_remap_datadog_logs_acks -2.23MiB -3.63 100.00% 61.48MiB 3.02MiB 63.21KiB 0 0.0491566 59.25MiB 4.64MiB 96.65KiB 0 0.0783425 False False
socket_to_socket_blackhole -889.8KiB -3.65 100.00% 23.8MiB 194.37KiB 3.97KiB 0 0.00797235 22.93MiB 107.48KiB 2.19KiB 0 0.00457538 False False

github-actions[bot] avatar Sep 17 '22 01:09 github-actions[bot]

I'm looking at the CEF v26 specification from here: https://www.microfocus.com/documentation/arcsight/arcsight-smartconnectors-8.3/pdfdoc/cef-implementation-standard/cef-implementation-standard.pdf

I'm not very familiar with this format, but the documentation seems to imply the most common format is with a syslog prefix, which the current implementation doesn't support. It would be good to support that, or explain why it's not needed.

For example, I expected the following example to parse correctly

Sep 29 08:26:10 host CEF:1|Security|threatmanager|1.0|100|worm successfully
stopped|10|src=10.0.0.1 dst=2.1.2.2 spt=1232

@fuchsnj this now works as expected.

ktff avatar Sep 17 '22 08:09 ktff

Soak Test Results

Baseline: d498040a770ae2bb5c9d25efce62acadcb17ee57 Comparison: dfe24fd5af7722d726b8bd9670cb54ea8b8204d5 Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

Fine details of change detection per experiment.
experiment Δ mean Δ mean % confidence baseline mean baseline stdev baseline stderr baseline outlier % baseline CoV comparison mean comparison stdev comparison stderr comparison outlier % comparison CoV erratic declared erratic
http_pipelines_blackhole_acks 10.65KiB 0.86 99.99% 1.21MiB 116.89KiB 2.38KiB 0 0.0943029 1.22MiB 70.35KiB 1.43KiB 0 0.0562726 False False
syslog_loki 127.0KiB 0.86 100.00% 14.37MiB 265.59KiB 5.44KiB 0 0.0180406 14.5MiB 748.05KiB 15.21KiB 0 0.0503778 False False
http_text_to_http_json 48.6KiB 0.12 94.31% 38.14MiB 905.12KiB 18.48KiB 0 0.0231674 38.19MiB 862.46KiB 17.6KiB 0 0.0220482 False False
splunk_hec_to_splunk_hec_logs_noack 5.23KiB 0.02 38.59% 23.83MiB 380.09KiB 7.77KiB 0 0.0155712 23.84MiB 336.19KiB 6.86KiB 0 0.0137699 False False
enterprise_http_to_http 650.73B 0 6.93% 23.85MiB 251.21KiB 5.13KiB 0 0.0102859 23.85MiB 254.6KiB 5.21KiB 0 0.0104243 False False
http_pipelines_blackhole -255.45B -0.01 7.84% 1.66MiB 49.62KiB 1.01KiB 0 0.0291759 1.66MiB 113.99KiB 2.32KiB 0 0.0670314 False False
splunk_hec_to_splunk_hec_logs_acks -7.48KiB -0.03 25.34% 23.77MiB 788.62KiB 16.05KiB 0 0.0323959 23.76MiB 818.49KiB 16.66KiB 0 0.0336332 False False
splunk_hec_indexer_ack_blackhole -12.26KiB -0.05 35.70% 23.75MiB 889.01KiB 18.09KiB 0 0.0365419 23.74MiB 948.56KiB 19.29KiB 0 0.0390094 False False
file_to_blackhole -48.17KiB -0.05 39.37% 95.35MiB 3.04MiB 63.01KiB 0 0.0318738 95.3MiB 3.32MiB 69.02KiB 0 0.0348198 False False
http_to_http_noack -25.99KiB -0.11 83.62% 23.83MiB 521.42KiB 10.65KiB 0 0.0213647 23.8MiB 751.72KiB 15.33KiB 0 0.0308342 False False
http_to_http_json -42.0KiB -0.17 99.86% 23.85MiB 327.46KiB 6.69KiB 0 0.0134073 23.81MiB 555.67KiB 11.34KiB 0 0.0227901 False False
fluent_elasticsearch -157.2KiB -0.19 100.00% 79.47MiB 52.52KiB 1.06KiB 0 0.000645199 79.32MiB 1.56MiB 32.06KiB 0 0.0196179 False False
http_to_http_acks -59.0KiB -0.33 19.65% 17.4MiB 8.03MiB 167.93KiB 0 0.461386 17.34MiB 8.03MiB 167.35KiB 0 0.462666 True True
syslog_log2metric_humio_metrics -44.7KiB -0.35 100.00% 12.34MiB 241.99KiB 4.94KiB 0 0.0191518 12.29MiB 473.09KiB 9.63KiB 0 0.0375747 False False
datadog_agent_remap_blackhole_acks -303.19KiB -0.48 99.78% 61.56MiB 3.99MiB 83.12KiB 0 0.0647964 61.27MiB 2.55MiB 53.43KiB 0 0.0416761 False False
splunk_hec_route_s3 -104.64KiB -0.56 86.98% 18.11MiB 2.38MiB 49.49KiB 0 0.131177 18.0MiB 2.31MiB 48.27KiB 0 0.128191 False False
datadog_agent_remap_datadog_logs_acks -451.39KiB -0.71 100.00% 62.24MiB 2.8MiB 58.63KiB 0 0.045018 61.8MiB 4.39MiB 91.36KiB 0 0.0710098 False False
datadog_agent_remap_blackhole -441.06KiB -0.8 93.62% 53.91MiB 7.92MiB 165.07KiB 0 0.146824 53.48MiB 8.21MiB 171.36KiB 0 0.153427 False False
datadog_agent_remap_datadog_logs -518.57KiB -0.81 100.00% 62.27MiB 303.97KiB 6.22KiB 0 0.00476598 61.76MiB 3.8MiB 79.25KiB 0 0.0615644 False False
syslog_splunk_hec_logs -196.61KiB -1.17 100.00% 16.37MiB 809.69KiB 16.46KiB 0 0.0482945 16.18MiB 678.41KiB 13.8KiB 0 0.0409445 False False
syslog_regex_logs2metric_ddmetrics -208.46KiB -1.62 100.00% 12.58MiB 599.76KiB 12.22KiB 0 0.046555 12.37MiB 444.36KiB 9.06KiB 0 0.0350598 False False
syslog_humio_logs -305.38KiB -1.81 100.00% 16.45MiB 494.52KiB 10.1KiB 0 0.0293471 16.15MiB 468.01KiB 9.59KiB 0 0.0282863 False False
syslog_log2metric_splunk_hec_metrics -449.62KiB -2.43 100.00% 18.08MiB 488.35KiB 9.96KiB 0 0.026373 17.64MiB 679.97KiB 13.85KiB 0 0.0376353 False False
http_pipelines_no_grok_blackhole -287.41KiB -2.58 100.00% 10.89MiB 53.36KiB 1.09KiB 0 0.00478409 10.61MiB 990.72KiB 20.16KiB 0 0.0911755 False False
socket_to_socket_blackhole -632.2KiB -2.61 100.00% 23.65MiB 382.39KiB 7.81KiB 0 0.0157892 23.03MiB 163.43KiB 3.34KiB 0 0.00692898 False False

github-actions[bot] avatar Sep 17 '22 09:09 github-actions[bot]

Example being the syslog source rejects invalid messages when it fails to decode, and the case of "pseudo-syslog" - you can use the socket source and handle the decoding in VRL.

Yes, indeed we parse with VRL + socket source instead of the syslog source directly due to the need to properly log errors during parsing, i.e. log the offending packet and source peer. This is something that the sources in general are not very good at the moment (see #7750) :(

So let's after all make this translation a separate feature. It seems like it will require an option so that it's opt in. Once active it can change the type definition of the parser so that it returns string and array. Or at least that seems like a way to do it.

I fully agree to better move this part to another PR so this one here can make progress. Indeed looks like that part needs more discussion. Apologies for the noise!

hhromic avatar Sep 17 '22 12:09 hhromic

Soak Test Results

Baseline: 512da4076a67a229996f96e51e711dc2af37dcf2 Comparison: 8117579af027460f10ea0bbb5956163dd1aa3a2a Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

Fine details of change detection per experiment.
experiment Δ mean Δ mean % confidence baseline mean baseline stdev baseline stderr baseline outlier % baseline CoV comparison mean comparison stdev comparison stderr comparison outlier % comparison CoV erratic declared erratic
http_to_http_acks 287.83KiB 1.66 77.76% 16.96MiB 7.98MiB 166.96KiB 0 0.470654 17.24MiB 7.98MiB 166.57KiB 0 0.462744 True True
http_pipelines_blackhole_acks 15.37KiB 1.24 100.00% 1.21MiB 103.73KiB 2.11KiB 0 0.0839684 1.22MiB 91.41KiB 1.86KiB 0 0.0730894 False False
syslog_loki 104.39KiB 0.72 100.00% 14.16MiB 406.01KiB 8.32KiB 0 0.0279889 14.27MiB 722.48KiB 14.69KiB 0 0.0494497 False False
splunk_hec_to_splunk_hec_logs_acks 22.03KiB 0.09 64.20% 23.75MiB 871.27KiB 17.72KiB 0 0.0358235 23.77MiB 792.59KiB 16.13KiB 0 0.0325592 False False
splunk_hec_indexer_ack_blackhole 22.42KiB 0.09 64.97% 23.75MiB 876.66KiB 17.83KiB 0 0.0360423 23.77MiB 789.74KiB 16.07KiB 0 0.0324387 False False
splunk_hec_to_splunk_hec_logs_noack 23.0KiB 0.09 93.56% 23.82MiB 503.11KiB 10.27KiB 0 0.0206241 23.84MiB 343.25KiB 7.01KiB 0 0.0140575 False False
enterprise_http_to_http 55.43B 0 0.59% 23.85MiB 253.86KiB 5.18KiB 0 0.010394 23.85MiB 254.69KiB 5.21KiB 0 0.0104279 False False
file_to_blackhole -65.46KiB -0.07 55.98% 95.35MiB 2.72MiB 56.41KiB 0 0.0285348 95.29MiB 3.04MiB 63.31KiB 0 0.0319455 False False
http_to_http_json -33.03KiB -0.14 99.16% 23.85MiB 335.84KiB 6.86KiB 0 0.0137507 23.81MiB 512.63KiB 10.47KiB 0 0.0210177 False False
fluent_elasticsearch -159.74KiB -0.2 100.00% 79.47MiB 54.02KiB 1.09KiB 0 0.000663709 79.32MiB 1.41MiB 28.9KiB 0 0.0177107 False False
http_to_http_noack -76.91KiB -0.32 99.85% 23.83MiB 508.86KiB 10.4KiB 0 0.0208505 23.75MiB 1.05MiB 21.9KiB 0 0.0442039 False False
syslog_splunk_hec_logs -155.74KiB -0.96 100.00% 15.89MiB 812.2KiB 16.52KiB 0 0.0499162 15.73MiB 620.69KiB 12.67KiB 0 0.0385151 False False
datadog_agent_remap_blackhole -631.07KiB -1.06 100.00% 58.27MiB 3.79MiB 78.9KiB 0 0.0649641 57.66MiB 3.22MiB 67.08KiB 0 0.055759 False False
syslog_humio_logs -190.72KiB -1.18 100.00% 15.83MiB 666.56KiB 13.61KiB 0 0.0411163 15.64MiB 588.87KiB 12.05KiB 0 0.0367565 False False
syslog_regex_logs2metric_ddmetrics -163.06KiB -1.29 100.00% 12.37MiB 614.64KiB 12.53KiB 0 0.0485166 12.21MiB 584.3KiB 11.91KiB 0 0.0467231 False False
http_pipelines_blackhole -28.02KiB -1.65 100.00% 1.66MiB 55.85KiB 1.14KiB 0 0.0327915 1.64MiB 123.86KiB 2.52KiB 0 0.0739303 False False
syslog_log2metric_splunk_hec_metrics -324.59KiB -1.75 100.00% 18.09MiB 614.55KiB 12.52KiB 0 0.0331675 17.77MiB 812.45KiB 16.53KiB 0 0.0446304 False False
syslog_log2metric_humio_metrics -247.28KiB -1.88 100.00% 12.82MiB 197.12KiB 4.03KiB 0 0.0150086 12.58MiB 508.37KiB 10.35KiB 0 0.039451 False False
splunk_hec_route_s3 -407.38KiB -2.11 100.00% 18.84MiB 2.29MiB 47.8KiB 0 0.121755 18.44MiB 2.21MiB 46.29KiB 0 0.119914 False False
datadog_agent_remap_datadog_logs -1.32MiB -2.13 100.00% 61.98MiB 456.92KiB 9.35KiB 0 0.00719798 60.66MiB 4.06MiB 84.54KiB 0 0.0669117 False False
http_pipelines_no_grok_blackhole -257.15KiB -2.34 100.00% 10.73MiB 332.93KiB 6.8KiB 0 0.0302982 10.48MiB 1.08MiB 22.58KiB 0 0.103532 False False
datadog_agent_remap_datadog_logs_acks -1.5MiB -2.43 100.00% 61.57MiB 3.08MiB 64.37KiB 0 0.0500053 60.07MiB 4.23MiB 88.11KiB 0 0.0704481 False False
datadog_agent_remap_blackhole_acks -1.46MiB -2.47 100.00% 58.94MiB 4.24MiB 88.34KiB 0 0.0719742 57.48MiB 2.91MiB 60.9KiB 0 0.0506441 False False
http_text_to_http_json -1.85MiB -4.67 100.00% 39.57MiB 1.11MiB 23.17KiB 0 0.0280107 37.73MiB 1005.7KiB 20.54KiB 0 0.0260275 False False
socket_to_socket_blackhole -1.49MiB -6.06 100.00% 24.6MiB 289.29KiB 5.91KiB 0 0.0114829 23.11MiB 124.6KiB 2.54KiB 0 0.0052645 False False

github-actions[bot] avatar Sep 20 '22 18:09 github-actions[bot]

Soak Test Results

Baseline: 9cf1ea9b08ed745e3872c1cc81757f6078c82419 Comparison: acb41109c3d5e6839de9a32542e1f75fc58433bf Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

Fine details of change detection per experiment.
experiment Δ mean Δ mean % confidence baseline mean baseline stdev baseline stderr baseline outlier % baseline CoV comparison mean comparison stdev comparison stderr comparison outlier % comparison CoV erratic declared erratic
http_pipelines_blackhole_acks 20.56KiB 1.69 100.00% 1.19MiB 137.87KiB 2.8KiB 0 0.11345 1.21MiB 99.87KiB 2.04KiB 0 0.0808164 False False
http_pipelines_blackhole 16.22KiB 0.97 100.00% 1.62MiB 106.62KiB 2.18KiB 0 0.0640698 1.64MiB 144.67KiB 2.95KiB 0 0.0860995 False False
socket_to_socket_blackhole 31.99KiB 0.14 72.53% 22.6MiB 995.23KiB 20.32KiB 0 0.0429963 22.63MiB 1.01MiB 21.09KiB 0 0.044557 False False
splunk_hec_to_splunk_hec_logs_acks 19.3KiB 0.08 57.46% 23.75MiB 880.16KiB 17.9KiB 0 0.0361888 23.77MiB 801.26KiB 16.31KiB 0 0.0329186 False False
splunk_hec_to_splunk_hec_logs_noack 9.13KiB 0.04 59.95% 23.83MiB 416.36KiB 8.51KiB 0 0.017058 23.84MiB 330.17KiB 6.74KiB 0 0.0135217 False False
enterprise_http_to_http -1.39KiB -0.01 15.28% 23.85MiB 247.79KiB 5.06KiB 0 0.0101454 23.85MiB 250.84KiB 5.13KiB 0 0.0102708 False False
splunk_hec_indexer_ack_blackhole -1.66KiB -0.01 5.14% 23.75MiB 883.93KiB 17.98KiB 0 0.0363378 23.75MiB 906.22KiB 18.43KiB 0 0.0372565 False False
file_to_blackhole -54.77KiB -0.06 43.16% 95.34MiB 3.03MiB 62.74KiB 0 0.031738 95.29MiB 3.49MiB 72.67KiB 0 0.0366625 False False
http_to_http_json -26.36KiB -0.11 97.46% 23.85MiB 333.92KiB 6.82KiB 0 0.0136714 23.82MiB 470.3KiB 9.62KiB 0 0.0192759 False False
fluent_elasticsearch -182.93KiB -0.22 100.00% 79.47MiB 53.72KiB 1.09KiB 0 0.000660026 79.29MiB 1.58MiB 32.55KiB 0 0.0199407 False False
datadog_agent_remap_blackhole_acks -168.45KiB -0.28 89.49% 58.75MiB 4.13MiB 85.94KiB 0 0.0702393 58.59MiB 2.8MiB 58.44KiB 0 0.0477071 False False
http_to_http_acks -64.77KiB -0.36 21.36% 17.34MiB 8.14MiB 170.18KiB 0 0.469447 17.27MiB 8.04MiB 167.86KiB 0 0.465609 True True
http_to_http_noack -122.45KiB -0.5 100.00% 23.84MiB 408.45KiB 8.35KiB 0 0.01673 23.72MiB 1.23MiB 25.68KiB 0 0.0519597 False False
syslog_regex_logs2metric_ddmetrics -79.45KiB -0.62 100.00% 12.45MiB 617.85KiB 12.58KiB 0 0.048442 12.38MiB 549.4KiB 11.2KiB 0 0.0433459 False False
splunk_hec_route_s3 -145.12KiB -0.78 97.02% 18.21MiB 2.28MiB 47.44KiB 0 0.125066 18.07MiB 2.25MiB 46.99KiB 0 0.124455 False False
syslog_loki -116.12KiB -0.8 100.00% 14.14MiB 627.25KiB 12.83KiB 0 0.043312 14.03MiB 822.42KiB 16.72KiB 0 0.0572474 False False
syslog_splunk_hec_logs -136.85KiB -0.83 100.00% 16.17MiB 744.94KiB 15.16KiB 0 0.0449838 16.03MiB 520.99KiB 10.64KiB 0 0.0317228 False False
datadog_agent_remap_datadog_logs_acks -823.21KiB -1.29 100.00% 62.42MiB 3.14MiB 65.65KiB 0 0.0503053 61.61MiB 4.37MiB 90.97KiB 0 0.0709139 False False
syslog_humio_logs -256.51KiB -1.51 100.00% 16.6MiB 259.69KiB 5.3KiB 0 0.0152771 16.35MiB 259.09KiB 5.3KiB 0 0.0154754 False False
http_pipelines_no_grok_blackhole -174.14KiB -1.56 100.00% 10.89MiB 257.58KiB 5.26KiB 0 0.0231027 10.72MiB 968.39KiB 19.71KiB 0 0.088236 False False
datadog_agent_remap_datadog_logs -1003.85KiB -1.61 100.00% 60.76MiB 1.75MiB 36.67KiB 0 0.0287631 59.78MiB 4.31MiB 89.72KiB 0 0.0720615 False False
syslog_log2metric_splunk_hec_metrics -299.09KiB -1.67 100.00% 17.48MiB 835.59KiB 17.02KiB 0 0.0466667 17.19MiB 941.02KiB 19.14KiB 0 0.053448 False False
datadog_agent_remap_blackhole -1.24MiB -2.11 100.00% 59.05MiB 4.7MiB 97.93KiB 0 0.0795947 57.8MiB 3.58MiB 74.71KiB 0 0.0618756 False False
http_text_to_http_json -1.09MiB -2.76 100.00% 39.57MiB 744.78KiB 15.2KiB 0 0.0183758 38.48MiB 832.95KiB 17.01KiB 0 0.0211342 False False
syslog_log2metric_humio_metrics -498.83KiB -3.85 100.00% 12.65MiB 289.72KiB 5.91KiB 0 0.0223601 12.16MiB 781.76KiB 15.9KiB 0 0.0627512 False False

github-actions[bot] avatar Sep 21 '22 00:09 github-actions[bot]

Soak Test Results

Baseline: 9cf1ea9b08ed745e3872c1cc81757f6078c82419 Comparison: 8c3de2a6b624ec0fe7713567fe341a0f1158b24f Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

Fine details of change detection per experiment.
experiment Δ mean Δ mean % confidence baseline mean baseline stdev baseline stderr baseline outlier % baseline CoV comparison mean comparison stdev comparison stderr comparison outlier % comparison CoV erratic declared erratic
socket_to_socket_blackhole 747.72KiB 3.43 100.00% 21.3MiB 2.01MiB 41.96KiB 0 0.0942174 22.03MiB 1.72MiB 35.92KiB 0 0.0779846 False False
http_pipelines_blackhole_acks 11.79KiB 0.94 100.00% 1.22MiB 104.27KiB 2.12KiB 0 0.0833092 1.23MiB 93.67KiB 1.91KiB 0 0.0741388 False False
http_to_http_acks 54.79KiB 0.31 18.26% 17.33MiB 7.98MiB 166.85KiB 0 0.460437 17.39MiB 8.09MiB 168.66KiB 0 0.465062 True True
syslog_loki 39.51KiB 0.27 98.12% 14.46MiB 383.41KiB 7.85KiB 0 0.0258889 14.5MiB 731.57KiB 14.87KiB 0 0.049266 False False
splunk_hec_to_splunk_hec_logs_noack 32.38KiB 0.13 98.30% 23.81MiB 574.25KiB 11.71KiB 0 0.0235504 23.84MiB 334.42KiB 6.83KiB 0 0.0136966 False False
syslog_log2metric_humio_metrics 2.27KiB 0.02 16.58% 12.52MiB 218.43KiB 4.46KiB 0 0.0170287 12.53MiB 486.38KiB 9.91KiB 0 0.0379108 False False
splunk_hec_indexer_ack_blackhole -1.21KiB -0 3.88% 23.75MiB 868.24KiB 17.66KiB 0 0.0356912 23.75MiB 863.39KiB 17.56KiB 0 0.0354933 False False
enterprise_http_to_http 422.28B 0 4.47% 23.84MiB 254.73KiB 5.2KiB 0 0.0104301 23.85MiB 254.67KiB 5.21KiB 0 0.0104274 False False
splunk_hec_to_splunk_hec_logs_acks -3.72KiB -0.02 12.06% 23.75MiB 841.97KiB 17.13KiB 0 0.0346083 23.75MiB 861.71KiB 17.53KiB 0 0.035425 False False
file_to_blackhole -58.69KiB -0.06 48.23% 95.36MiB 2.83MiB 58.7KiB 0 0.0296878 95.3MiB 3.33MiB 69.18KiB 0 0.0348893 False False
datadog_agent_remap_blackhole -82.01KiB -0.13 55.54% 61.22MiB 4.18MiB 87.13KiB 0 0.0683079 61.14MiB 3.0MiB 62.57KiB 0 0.0490487 False False
http_to_http_json -38.04KiB -0.16 99.65% 23.84MiB 345.85KiB 7.06KiB 0 0.0141617 23.81MiB 535.13KiB 10.92KiB 0 0.0219462 False False
fluent_elasticsearch -206.94KiB -0.25 100.00% 79.47MiB 52.35KiB 1.06KiB 0 0.000643189 79.27MiB 1.77MiB 36.49KiB 0 0.0223655 False False
http_to_http_noack -61.85KiB -0.25 99.35% 23.83MiB 515.0KiB 10.53KiB 0 0.0211007 23.77MiB 987.13KiB 20.11KiB 0 0.0405479 False False
http_pipelines_blackhole -4.88KiB -0.29 88.04% 1.64MiB 71.61KiB 1.46KiB 0 0.042531 1.64MiB 135.92KiB 2.77KiB 0 0.0809564 False False
syslog_regex_logs2metric_ddmetrics -71.58KiB -0.55 100.00% 12.6MiB 631.77KiB 12.87KiB 0 0.0489578 12.53MiB 495.86KiB 10.11KiB 0 0.0386401 False False
datadog_agent_remap_blackhole_acks -361.51KiB -0.57 99.92% 61.99MiB 4.14MiB 86.28KiB 0 0.0668152 61.63MiB 3.09MiB 64.57KiB 0 0.0501094 False False
datadog_agent_remap_datadog_logs -687.27KiB -1.1 100.00% 61.26MiB 811.04KiB 16.59KiB 0 0.0129256 60.59MiB 4.08MiB 84.95KiB 0 0.067311 False False
datadog_agent_remap_datadog_logs_acks -803.9KiB -1.26 100.00% 62.24MiB 3.43MiB 71.62KiB 0 0.0550889 61.45MiB 4.36MiB 90.76KiB 0 0.0709335 False False
syslog_splunk_hec_logs -210.39KiB -1.27 100.00% 16.22MiB 952.33KiB 19.37KiB 0 0.0573089 16.02MiB 803.0KiB 16.38KiB 0 0.0489421 False False
syslog_humio_logs -245.07KiB -1.42 100.00% 16.8MiB 112.3KiB 2.29KiB 0 0.00652474 16.57MiB 106.35KiB 2.18KiB 0 0.00626818 False False
http_pipelines_no_grok_blackhole -180.72KiB -1.62 100.00% 10.89MiB 89.12KiB 1.82KiB 0 0.00798668 10.72MiB 1.06MiB 22.16KiB 0 0.0993002 False False
splunk_hec_route_s3 -328.95KiB -1.69 100.00% 18.98MiB 2.23MiB 46.41KiB 0 0.117373 18.66MiB 2.24MiB 46.78KiB 0 0.119893 False False
syslog_log2metric_splunk_hec_metrics -316.27KiB -1.76 100.00% 17.52MiB 635.89KiB 12.97KiB 0 0.035443 17.21MiB 899.86KiB 18.3KiB 0 0.0510564 False False
http_text_to_http_json -1.13MiB -2.87 100.00% 39.36MiB 798.55KiB 16.3KiB 0 0.0198089 38.23MiB 867.51KiB 17.72KiB 0 0.0221545 False False

github-actions[bot] avatar Sep 21 '22 00:09 github-actions[bot]

Soak Test Results

Baseline: 197ed5b27452aee5b51ba4db2443ca3ac1814634 Comparison: ef19c4faf1699cb058e301c57718fcf744b34422 Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

Fine details of change detection per experiment.
experiment Δ mean Δ mean % confidence baseline mean baseline stdev baseline stderr baseline outlier % baseline CoV comparison mean comparison stdev comparison stderr comparison outlier % comparison CoV erratic declared erratic
socket_to_socket_blackhole 491.96KiB 2.12 100.00% 22.7MiB 118.06KiB 2.41KiB 0 0.00507806 23.18MiB 105.45KiB 2.15KiB 0 0.0044418 False False
http_pipelines_blackhole_acks 16.74KiB 1.38 100.00% 1.19MiB 119.31KiB 2.43KiB 0 0.098001 1.2MiB 72.6KiB 1.48KiB 0 0.0588225 False False
syslog_log2metric_splunk_hec_metrics 224.39KiB 1.32 100.00% 16.54MiB 1.06MiB 22.15KiB 0 0.0641589 16.76MiB 1.05MiB 21.9KiB 0 0.0626851 False False
syslog_splunk_hec_logs 54.01KiB 0.33 98.12% 15.77MiB 853.49KiB 17.36KiB 0 0.0528559 15.82MiB 738.48KiB 15.06KiB 0 0.0455808 False False
splunk_hec_to_splunk_hec_logs_noack 47.03KiB 0.19 99.80% 23.79MiB 666.62KiB 13.58KiB 0 0.027354 23.84MiB 335.37KiB 6.85KiB 0 0.013735 False False
splunk_hec_indexer_ack_blackhole 15.58KiB 0.06 46.60% 23.74MiB 889.03KiB 18.08KiB 0 0.0365617 23.76MiB 852.14KiB 17.34KiB 0 0.0350223 False False
enterprise_http_to_http -760.44B -0 8.00% 23.85MiB 256.17KiB 5.23KiB 0 0.0104887 23.85MiB 255.94KiB 5.23KiB 0 0.0104796 False False
splunk_hec_to_splunk_hec_logs_acks 0B -0 0.00% 23.74MiB 892.73KiB 18.15KiB 0 0.0367117 23.74MiB 900.69KiB 18.31KiB 0 0.0370391 False False
syslog_humio_logs -3.56KiB -0.02 53.07% 16.49MiB 194.81KiB 3.98KiB 0 0.0115376 16.48MiB 141.54KiB 2.9KiB 0 0.00838435 False False
file_to_blackhole -55.68KiB -0.06 46.03% 95.35MiB 2.87MiB 59.42KiB 0 0.0300581 95.29MiB 3.3MiB 68.64KiB 0 0.0346178 False False
http_pipelines_blackhole -1.21KiB -0.07 36.60% 1.68MiB 41.7KiB 872.79B 0 0.0242936 1.67MiB 117.12KiB 2.39KiB 0 0.068274 False False
http_to_http_json -34.31KiB -0.14 99.38% 23.85MiB 327.0KiB 6.68KiB 0 0.0133884 23.81MiB 518.66KiB 10.59KiB 0 0.0212654 False False
syslog_regex_logs2metric_ddmetrics -28.72KiB -0.23 97.22% 12.01MiB 453.97KiB 9.25KiB 0 0.0369163 11.98MiB 451.63KiB 9.2KiB 0 0.0368117 False False
http_to_http_noack -62.05KiB -0.25 99.40% 23.83MiB 514.2KiB 10.5KiB 0 0.021068 23.77MiB 981.09KiB 19.99KiB 0 0.0403001 False False
http_to_http_acks -46.25KiB -0.26 15.65% 17.3MiB 8.0MiB 167.38KiB 0 0.462494 17.26MiB 7.85MiB 163.89KiB 0 0.454519 True True
fluent_elasticsearch -215.31KiB -0.26 100.00% 79.47MiB 54.25KiB 1.1KiB 0 0.000666549 79.26MiB 2.25MiB 46.2KiB 0 0.0283342 False False
datadog_agent_remap_blackhole_acks -189.65KiB -0.31 91.55% 59.91MiB 4.42MiB 92.01KiB 0 0.0737631 59.73MiB 2.88MiB 60.13KiB 0 0.04813 False False
syslog_loki -144.53KiB -0.99 100.00% 14.3MiB 485.42KiB 9.95KiB 0 0.0331493 14.16MiB 863.07KiB 17.54KiB 0 0.0595269 False False
datadog_agent_remap_datadog_logs_acks -742.73KiB -1.18 100.00% 61.31MiB 3.09MiB 64.52KiB 0 0.050335 60.59MiB 4.29MiB 89.24KiB 0 0.0707459 False False
splunk_hec_route_s3 -301.18KiB -1.57 100.00% 18.73MiB 2.27MiB 47.25KiB 0 0.121192 18.43MiB 2.22MiB 46.53KiB 0 0.12064 False False
http_pipelines_no_grok_blackhole -276.94KiB -2.57 100.00% 10.54MiB 596.16KiB 12.17KiB 0 0.0552289 10.27MiB 1.17MiB 24.39KiB 0 0.114079 False False
http_text_to_http_json -1.07MiB -2.9 100.00% 36.87MiB 2.51MiB 52.37KiB 0 0.0679495 35.8MiB 2.72MiB 56.72KiB 0 0.07583 False False
datadog_agent_remap_datadog_logs -1.99MiB -3.4 100.00% 58.48MiB 4.6MiB 96.54KiB 0 0.0786976 56.49MiB 6.22MiB 129.5KiB 0 0.110058 False False
syslog_log2metric_humio_metrics -441.36KiB -3.6 100.00% 11.98MiB 903.2KiB 18.44KiB 0 0.0736128 11.55MiB 958.02KiB 19.51KiB 0 0.0809946 False False
datadog_agent_remap_blackhole -2.67MiB -4.43 100.00% 60.2MiB 4.23MiB 88.26KiB 0 0.0703133 57.54MiB 3.89MiB 81.18KiB 0 0.0676018 False False

github-actions[bot] avatar Sep 21 '22 17:09 github-actions[bot]

@ktff today I had to help fixing some parsing errors in our regex-based CEF processing pipeline. I couldn't help myself but thinking that this PR will improve our lives dramatically. This feed in particular is ~40K EPS of CEF data (PaloAlto devices), and I will definitively test your parse_cef() implementation with it.

I noticed two particularities in the CEF data today, that I wanted to bring up for you to consider (if not already). Unfortunately I don't have the means currently to build/test your PR myself.

The first case is of CEF extension fields with empty values. For example app= msg= act=. The second case is more funky: CEF extension fields with quoted values: msg="Some message.".

I wonder if for the first case, your implementation will return empty-valued keys or would discard them entirely? And also, for the second case, if parse_cef() will strip the quotes from the value? That woud be really nice.

hhromic avatar Sep 21 '22 18:09 hhromic

@ktff today I had to help fixing some parsing errors in our regex-based CEF processing pipeline. I couldn't help myself but thinking that this PR will improve our lives dramatically. This feed in particular is ~40K EPS of CEF data (PaloAlto devices), and I will definitively test your parse_cef() implementation with it.

I noticed two particularities in the CEF data today, that I wanted to bring up for you to consider (if not already). Unfortunately I don't have the means currently to build/test your PR myself.

The first case is of CEF extension fields with empty values. For example app= msg= act=. The second case is more funky: CEF extension fields with quoted values: msg="Some message.".

I wonder if for the first case, your implementation will return empty-valued keys or would discard them entirely? And also, for the second case, if parse_cef() will strip the quotes from the value? That woud be really nice.

  1. Example with empty extension values (fails to parse)
$ parse_cef!("Sep 29 08:26:10 host CEF:1|Security|threatmanager|1.0|100|worm successfully stopped|10|src=10.0.0.1 dst= spt=")
function call error for "parse_cef" at (0:123): Could not parse whole line successfully
  1. Example with quoted extension value (quotes are kept)
$ parse_cef!("Sep 29 08:26:10 host CEF:1|Security|threatmanager|1.0|100|worm successfully stopped|10|src=10.0.0.1 dst=\"2.1.2.2\" spt=\"1232\"")
{ "cefVersion": "1", "deviceEventClassId": "100", "deviceProduct": "threatmanager", "deviceVendor": "Security", "deviceVersion": "1.0", "dst": "\"2.1.2.2\"", "name": "worm successfully stopped", "severity": "10", "spt": "\"1232\"", "src": "10.0.0.1" }

Another edge case I noticed. Empty extensions at the end seems to work fine, but empty extensions not at the end (example 1 above) fail.

$ parse_cef!("Sep 29 08:26:10 host CEF:1|Security|threatmanager|1.0|100|worm successfully stopped|10|src=10.0.0.1 spt=")
{ "cefVersion": "1", "deviceEventClassId": "100", "deviceProduct": "threatmanager", "deviceVendor": "Security", "deviceVersion": "1.0", "name": "worm successfully stopped", "severity": "10", "spt": "", "src": "10.0.0.1" }

However, if you have an "empty" extension with multiple spaces, it parses again (capturing the spaces)

$ parse_cef!("Sep 29 08:26:10 host CEF:1|Security|threatmanager|1.0|100|worm successfully stopped|10|src=  spt=")
{ "cefVersion": "1", "deviceEventClassId": "100", "deviceProduct": "threatmanager", "deviceVendor": "Security", "deviceVersion": "1.0", "name": "worm successfully stopped", "severity": "10", "spt": "", "src": " " }
  1. I think supporting 1 above is reasonable, even though it is not mentioned in the spec. It should close up the edge cases mentioned above too.
  2. I'm a bit on the fence for quoted values, since it's not mentioned in the spec, and a user could conceivably prefer to have the quotes, but I'm open to discussion, or potentially adding this as an option (potentially defaulted on).

fuchsnj avatar Sep 21 '22 18:09 fuchsnj