vector
vector copied to clipboard
enhancement(http): log source IP as `host` key in the `http` source
closes #4221
Hey @jszwedko & @neuronull , any update on this PR?
👋 hey @NishantJoshi00
I ran CI on this PR and it is looking great.
I think the suggestion from @estherk15 is worth making.
One thing I forgot but CI found- this change is an enhancement which creates a user experience change, and should have a changelog fragment to coincide.
It's a fairly straightforward process, described here: https://github.com/vectordotdev/vector/blob/master/changelog.d/README.md
Here is an example, note you need to specify authors at the end of the file, and a link to your GH profile will be rendered in the release notes as the contributor to this change.
https://github.com/vectordotdev/vector/blob/master/changelog.d/19639_unit_test_vrl_metadata.enhancement.md?plain=1
Hey @neuronull , can you please verify if the changelog entry is up to the mark, specified by the requirements
Hey @neuronull, my name is failing the spell check 😢
Hey @neuronull, my name is failing the spell check 😢
Ohh sorry about that, I believe we fixed this on master so you should just need to pull master into your branch. I attempted this but was not able to checkout your PR for some reason.
Hey @neuronull, I changed inline comments to make them aligned to the changes that @estherk15 suggested.
Regression Detector Results
Run ID: f8d07bc4-05ae-407d-a686-5e14c5a67230 Baseline: 295d74f8420a8d4a31cafcc76438b065e4773d2f Comparison: 2af9628afa270760a5b30f423f96f735349524a1 Total CPUs: 7
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
Significant changes in experiment optimization goals
Confidence level: 90.00% Effect size tolerance: |Δ mean %| ≥ 5.00%
| perf | experiment | goal | Δ mean % | Δ mean % CI |
|---|---|---|---|---|
| ✅ | http_to_http_acks | ingress throughput | +5.53 | [+4.17, +6.88] |
| ❌ | http_text_to_http_json | ingress throughput | -11.33 | [-11.44, -11.22] |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI |
|---|---|---|---|---|
| ✅ | http_to_http_acks | ingress throughput | +5.53 | [+4.17, +6.88] |
| ➖ | datadog_agent_remap_blackhole | ingress throughput | +1.48 | [+1.39, +1.58] |
| ➖ | syslog_log2metric_humio_metrics | ingress throughput | +0.70 | [+0.57, +0.83] |
| ➖ | syslog_log2metric_splunk_hec_metrics | ingress throughput | +0.59 | [+0.45, +0.74] |
| ➖ | datadog_agent_remap_blackhole_acks | ingress throughput | +0.59 | [+0.49, +0.69] |
| ➖ | datadog_agent_remap_datadog_logs | ingress throughput | +0.42 | [+0.34, +0.50] |
| ➖ | fluent_elasticsearch | ingress throughput | +0.31 | [-0.17, +0.79] |
| ➖ | http_to_s3 | ingress throughput | +0.30 | [+0.02, +0.57] |
| ➖ | syslog_loki | ingress throughput | +0.25 | [+0.20, +0.30] |
| ➖ | syslog_regex_logs2metric_ddmetrics | ingress throughput | +0.16 | [+0.05, +0.27] |
| ➖ | http_to_http_noack | ingress throughput | +0.05 | [-0.04, +0.15] |
| ➖ | http_to_http_json | ingress throughput | +0.03 | [-0.04, +0.10] |
| ➖ | file_to_blackhole | egress throughput | +0.02 | [-2.51, +2.56] |
| ➖ | splunk_hec_indexer_ack_blackhole | ingress throughput | +0.00 | [-0.14, +0.15] |
| ➖ | splunk_hec_to_splunk_hec_logs_acks | ingress throughput | -0.00 | [-0.16, +0.16] |
| ➖ | splunk_hec_to_splunk_hec_logs_noack | ingress throughput | -0.02 | [-0.14, +0.09] |
| ➖ | enterprise_http_to_http | ingress throughput | -0.13 | [-0.19, -0.06] |
| ➖ | datadog_agent_remap_datadog_logs_acks | ingress throughput | -0.28 | [-0.37, -0.19] |
| ➖ | syslog_humio_logs | ingress throughput | -0.35 | [-0.45, -0.26] |
| ➖ | otlp_grpc_to_blackhole | ingress throughput | -0.37 | [-0.45, -0.28] |
| ➖ | http_elasticsearch | ingress throughput | -0.83 | [-0.90, -0.76] |
| ➖ | splunk_hec_route_s3 | ingress throughput | -0.85 | [-1.33, -0.37] |
| ➖ | syslog_splunk_hec_logs | ingress throughput | -0.85 | [-0.89, -0.81] |
| ➖ | syslog_log2metric_tag_cardinality_limit_blackhole | ingress throughput | -1.75 | [-1.88, -1.61] |
| ➖ | socket_to_socket_blackhole | ingress throughput | -1.77 | [-1.83, -1.70] |
| ➖ | otlp_http_to_blackhole | ingress throughput | -2.16 | [-2.30, -2.03] |
| ❌ | http_text_to_http_json | ingress throughput | -11.33 | [-11.44, -11.22] |
Explanation
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
Regression Detector Results
Run ID: 5fa56358-399d-4b5a-a40b-1e7826086c90 Baseline: f88316cce7665c6dbf83a81a8261fa126b50542e Comparison: cc8a0c3614128fc982b35200b285d794c7205b52 Total CPUs: 7
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
Significant changes in experiment optimization goals
Confidence level: 90.00% Effect size tolerance: |Δ mean %| ≥ 5.00%
| perf | experiment | goal | Δ mean % | Δ mean % CI |
|---|---|---|---|---|
| ❌ | otlp_http_to_blackhole | ingress throughput | -5.08 | [-5.23, -4.93] |
| ❌ | http_text_to_http_json | ingress throughput | -11.73 | [-11.86, -11.61] |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI |
|---|---|---|---|---|
| ➖ | http_to_http_acks | ingress throughput | +4.88 | [+3.52, +6.24] |
| ➖ | file_to_blackhole | egress throughput | +2.40 | [-0.20, +5.00] |
| ➖ | datadog_agent_remap_blackhole_acks | ingress throughput | +1.16 | [+1.07, +1.26] |
| ➖ | datadog_agent_remap_blackhole | ingress throughput | +1.15 | [+1.04, +1.26] |
| ➖ | otlp_grpc_to_blackhole | ingress throughput | +0.23 | [+0.14, +0.32] |
| ➖ | http_to_http_noack | ingress throughput | +0.14 | [+0.04, +0.25] |
| ➖ | http_to_http_json | ingress throughput | +0.06 | [-0.02, +0.14] |
| ➖ | http_to_s3 | ingress throughput | +0.06 | [-0.22, +0.33] |
| ➖ | splunk_hec_indexer_ack_blackhole | ingress throughput | +0.00 | [-0.14, +0.15] |
| ➖ | splunk_hec_to_splunk_hec_logs_acks | ingress throughput | -0.00 | [-0.16, +0.16] |
| ➖ | splunk_hec_to_splunk_hec_logs_noack | ingress throughput | -0.03 | [-0.14, +0.09] |
| ➖ | enterprise_http_to_http | ingress throughput | -0.14 | [-0.22, -0.06] |
| ➖ | syslog_splunk_hec_logs | ingress throughput | -0.14 | [-0.19, -0.10] |
| ➖ | fluent_elasticsearch | ingress throughput | -0.56 | [-1.03, -0.08] |
| ➖ | syslog_log2metric_humio_metrics | ingress throughput | -0.58 | [-0.72, -0.45] |
| ➖ | datadog_agent_remap_datadog_logs | ingress throughput | -0.62 | [-0.71, -0.53] |
| ➖ | datadog_agent_remap_datadog_logs_acks | ingress throughput | -0.62 | [-0.71, -0.54] |
| ➖ | splunk_hec_route_s3 | ingress throughput | -0.72 | [-1.20, -0.23] |
| ➖ | syslog_humio_logs | ingress throughput | -1.06 | [-1.17, -0.94] |
| ➖ | syslog_regex_logs2metric_ddmetrics | ingress throughput | -1.12 | [-1.21, -1.02] |
| ➖ | syslog_loki | ingress throughput | -1.37 | [-1.45, -1.29] |
| ➖ | http_elasticsearch | ingress throughput | -1.37 | [-1.44, -1.31] |
| ➖ | syslog_log2metric_tag_cardinality_limit_blackhole | ingress throughput | -1.53 | [-1.67, -1.39] |
| ➖ | syslog_log2metric_splunk_hec_metrics | ingress throughput | -2.26 | [-2.40, -2.13] |
| ➖ | socket_to_socket_blackhole | ingress throughput | -4.29 | [-4.36, -4.21] |
| ❌ | otlp_http_to_blackhole | ingress throughput | -5.08 | [-5.23, -4.93] |
| ❌ | http_text_to_http_json | ingress throughput | -11.73 | [-11.86, -11.61] |
Explanation
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
Huh. Interesting, this change has flagged a performance regression in two test cases that exercise this component.
I'm wondering if its this bit of code
.map_request(move |mut request: hyper::Request<_>| {
request.extensions_mut().insert(PeerAddr(remote_addr));
Yes @neuronull, that's highly likely the case.
I tried to look around, nothing else seems to be introducing this cost. Neither SocketAddr nor the middleware that was introduced. It has either something to do with adding an extension to the request or the closure capturing the remote IP on every new connection (this shouldn't introduced such regression).
It might just be introduced by extension!
Hey @neuronull, can you tell me the command to do this benchmarking and regression testing locally?
Is it "make bench"?
👋 Hey @NishantJoshi00 , thanks for volunteering to investigate that.
Unfortunately there isn't a "pure" way to get the exact same experiment report locally, as we do in CI. There is some internal infrastructure being leveraged there which isn't exposed (even to the Vector team).
However, our regression tests (see regression/) , are using a tool called lading. This is in a public repo and can be run locally. So I believe you could run the failing experiments ("cases" in our repo) locally, using that tool:
https://github.com/DataDog/lading
I will try to do minor optimization wherever/if possible but seems there is no other way to solve this, as the regression suggested a delta of around 11%.
what is the base value around which this delta was calculated?
I am just curious as this would give me a fare estimate of what kind of optimization I might be able to do.
what is the base value around which this delta was calculated?
The baseline is determined from the merge base of the PR and master: https://github.com/vectordotdev/vector/blob/2f1c7850fbc039a894f51b844e919adf2fdc925d/.github/workflows/regression.yml#L151 . To determine performance differences in a PR, CI will build both the baseline and PR, run both n times for each experiment (I think 20 if I remember correctly), and compare the results.
Hey @neuronull @jszwedko. So, I have a hypothesis, let me know if it makes sense.
Currently we are passing the entire SocketAddr in the extensions section of the Service handlers.
Instead, what if we pass a reference to it, in this scenerio we cannot solve this with & as this will push us into a lifetime rabbit-hole, what if instead we pass a Arc. In this situation, instead of passing the entire SocketAddr we are just passing the reference to it.
| Type | Size (bytes) |
|---|---|
| SocketAddr | 32 |
| Arc | 8 |
This will reduce the size of the data that is being generated, copied and sent during every connection, which might optimize the perf.
I'wd like to know your thoughts on this hypothesis, and test this change too.
I'wd like to know your thoughts on this hypothesis
Looks good to me, I will re-trigger CI
Regression Detector Results
Run ID: b75b4279-bbce-4350-996c-5dae91757dad Baseline: 5d8160d72743df1e02fff9f69a8d4e37e1f2577a Comparison: 780db89c7cf5c3b0098df30e0eafc776a1b5c6b1 Total CPUs: 7
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
Significant changes in experiment optimization goals
Confidence level: 90.00% Effect size tolerance: |Δ mean %| ≥ 5.00%
| perf | experiment | goal | Δ mean % | Δ mean % CI |
|---|---|---|---|---|
| ✅ | http_to_http_acks | ingress throughput | +5.57 | [+4.21, +6.93] |
| ❌ | http_text_to_http_json | ingress throughput | -10.54 | [-10.64, -10.45] |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI |
|---|---|---|---|---|
| ✅ | http_to_http_acks | ingress throughput | +5.57 | [+4.21, +6.93] |
| ➖ | syslog_regex_logs2metric_ddmetrics | ingress throughput | +1.88 | [+1.78, +1.97] |
| ➖ | file_to_blackhole | egress throughput | +1.81 | [-0.74, +4.36] |
| ➖ | syslog_humio_logs | ingress throughput | +1.28 | [+1.17, +1.39] |
| ➖ | fluent_elasticsearch | ingress throughput | +1.27 | [+0.80, +1.74] |
| ➖ | syslog_log2metric_splunk_hec_metrics | ingress throughput | +1.25 | [+1.11, +1.39] |
| ➖ | datadog_agent_remap_datadog_logs_acks | ingress throughput | +1.18 | [+1.11, +1.25] |
| ➖ | socket_to_socket_blackhole | ingress throughput | +0.86 | [+0.79, +0.93] |
| ➖ | otlp_grpc_to_blackhole | ingress throughput | +0.78 | [+0.69, +0.87] |
| ➖ | datadog_agent_remap_datadog_logs | ingress throughput | +0.72 | [+0.64, +0.80] |
| ➖ | syslog_loki | ingress throughput | +0.59 | [+0.55, +0.63] |
| ➖ | syslog_log2metric_humio_metrics | ingress throughput | +0.37 | [+0.26, +0.49] |
| ➖ | syslog_log2metric_tag_cardinality_limit_blackhole | ingress throughput | +0.29 | [+0.16, +0.42] |
| ➖ | http_to_http_json | ingress throughput | +0.08 | [+0.00, +0.16] |
| ➖ | http_to_http_noack | ingress throughput | +0.08 | [-0.01, +0.18] |
| ➖ | splunk_hec_indexer_ack_blackhole | ingress throughput | +0.00 | [-0.14, +0.15] |
| ➖ | splunk_hec_to_splunk_hec_logs_acks | ingress throughput | -0.00 | [-0.16, +0.16] |
| ➖ | splunk_hec_route_s3 | ingress throughput | -0.03 | [-0.53, +0.47] |
| ➖ | otlp_http_to_blackhole | ingress throughput | -0.04 | [-0.19, +0.11] |
| ➖ | splunk_hec_to_splunk_hec_logs_noack | ingress throughput | -0.05 | [-0.16, +0.07] |
| ➖ | datadog_agent_remap_blackhole | ingress throughput | -0.06 | [-0.15, +0.03] |
| ➖ | http_to_s3 | ingress throughput | -0.08 | [-0.37, +0.20] |
| ➖ | enterprise_http_to_http | ingress throughput | -0.10 | [-0.19, -0.01] |
| ➖ | syslog_splunk_hec_logs | ingress throughput | -0.44 | [-0.48, -0.39] |
| ➖ | http_elasticsearch | ingress throughput | -0.44 | [-0.52, -0.36] |
| ➖ | datadog_agent_remap_blackhole_acks | ingress throughput | -0.62 | [-0.73, -0.51] |
| ❌ | http_text_to_http_json | ingress throughput | -10.54 | [-10.64, -10.45] |
Explanation
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
Regression Detector Results
Run ID: d8d36d7d-d2e7-4dc9-8306-78f515ff00df Baseline: 8db6288b4cc2ecf070649e0dc53879f267f41c32 Comparison: a1013bdf48ccc7cfdc3bbf3bd8972e81d8500639 Total CPUs: 7
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
Significant changes in experiment optimization goals
Confidence level: 90.00% Effect size tolerance: |Δ mean %| ≥ 5.00%
| perf | experiment | goal | Δ mean % | Δ mean % CI |
|---|---|---|---|---|
| ✅ | http_to_http_acks | ingress throughput | +6.67 | [+5.31, +8.03] |
| ❌ | http_text_to_http_json | ingress throughput | -9.15 | [-9.27, -9.03] |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI |
|---|---|---|---|---|
| ✅ | http_to_http_acks | ingress throughput | +6.67 | [+5.31, +8.03] |
| ➖ | syslog_regex_logs2metric_ddmetrics | ingress throughput | +2.17 | [+2.02, +2.33] |
| ➖ | datadog_agent_remap_datadog_logs_acks | ingress throughput | +1.15 | [+1.06, +1.24] |
| ➖ | http_elasticsearch | ingress throughput | +0.96 | [+0.89, +1.02] |
| ➖ | datadog_agent_remap_blackhole | ingress throughput | +0.84 | [+0.73, +0.94] |
| ➖ | datadog_agent_remap_blackhole_acks | ingress throughput | +0.68 | [+0.58, +0.78] |
| ➖ | http_to_s3 | ingress throughput | +0.43 | [+0.15, +0.71] |
| ➖ | syslog_log2metric_splunk_hec_metrics | ingress throughput | +0.42 | [+0.27, +0.57] |
| ➖ | syslog_humio_logs | ingress throughput | +0.35 | [+0.27, +0.44] |
| ➖ | otlp_http_to_blackhole | ingress throughput | +0.33 | [+0.19, +0.48] |
| ➖ | syslog_loki | ingress throughput | +0.32 | [+0.25, +0.39] |
| ➖ | datadog_agent_remap_datadog_logs | ingress throughput | +0.28 | [+0.19, +0.38] |
| ➖ | http_to_http_noack | ingress throughput | +0.14 | [+0.06, +0.22] |
| ➖ | http_to_http_json | ingress throughput | +0.05 | [-0.02, +0.13] |
| ➖ | splunk_hec_to_splunk_hec_logs_acks | ingress throughput | -0.00 | [-0.15, +0.15] |
| ➖ | splunk_hec_indexer_ack_blackhole | ingress throughput | -0.00 | [-0.14, +0.13] |
| ➖ | splunk_hec_to_splunk_hec_logs_noack | ingress throughput | -0.02 | [-0.14, +0.10] |
| ➖ | enterprise_http_to_http | ingress throughput | -0.15 | [-0.23, -0.07] |
| ➖ | otlp_grpc_to_blackhole | ingress throughput | -0.30 | [-0.38, -0.21] |
| ➖ | socket_to_socket_blackhole | ingress throughput | -0.34 | [-0.42, -0.27] |
| ➖ | fluent_elasticsearch | ingress throughput | -0.58 | [-1.04, -0.11] |
| ➖ | syslog_splunk_hec_logs | ingress throughput | -0.69 | [-0.77, -0.61] |
| ➖ | file_to_blackhole | egress throughput | -1.32 | [-3.66, +1.02] |
| ➖ | splunk_hec_route_s3 | ingress throughput | -1.52 | [-2.02, -1.01] |
| ➖ | syslog_log2metric_tag_cardinality_limit_blackhole | ingress throughput | -2.29 | [-2.42, -2.15] |
| ➖ | syslog_log2metric_humio_metrics | ingress throughput | -3.78 | [-3.91, -3.65] |
| ❌ | http_text_to_http_json | ingress throughput | -9.15 | [-9.27, -9.03] |
Explanation
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
@NishantJoshi00 I think it's worth a shot if you want to try the Arc approach. It looks like we are still seeing a regression.
An alternative here would be to make the field addition opt-in via configuration.
Hey @jszwedko, I apologies for the delay. I didn't get time to open and work on this PR for a few weeks. But, I am back and I can try and resolve this PR.
An alternative here would be to make the field addition opt-in via configuration.
By this, you mean compile time feature? because this is a framework level change and adding a runtime flag on this seems to be a challenge. My guess is that regression cost is added not because we are extracting the remote_addr but because we are plugging it back in using Extension.
I was thinking around the regression.
- Does the regression consider situation around communicating over a open connection? Or only regresses over data transferred over new connections? As by theory, the performance shouldn't be affected if data is being sent over an open connection.
- The advantage of adding
Arcwasn't as significant andSocketAddralready hasCopyshould we remove that? http_text_to_http_jsonwhat does this regression indicate?
Can you elaborate more on how I can make this a opt-in feature?
Hi @NishantJoshi00 ! Thanks for following up here![
By this, you mean compile time feature? because this is a framework level change and adding a runtime flag on this seems to be a challenge. My guess is that regression cost is added not because we are extracting the
remote_addrbut because we are plugging it back in usingExtension.
Ah, no, I mean as a runtime feature. For example, consider https://vector.dev/docs/reference/configuration/sources/syslog/#host_key. If that is empty, the host key is not added. I think we would just default to empty and let users opt into it.
It sounds like you think this will be difficult to make an optional feature though, due to the way it is implemented? If so, we can focus on seeing if we can reduce the overhead.
I was thinking around the regression.
- Does the regression consider situation around communicating over a open connection? Or only regresses over data transferred over new connections? As by theory, the performance shouldn't be affected if data is being sent over an open connection.
Good question. I believe the benchmark multiplexes over a fixed set of connections.
You can find the benchmark configuration here: https://github.com/vectordotdev/vector/tree/master/regression/cases/http_to_http_json
https://github.com/DataDog/lading is the load generation too.
- The advantage of adding
Arcwasn't as significant andSocketAddralready hasCopyshould we remove that?
Agreed, if you don't think Arc helped we should drop it.
http_text_to_http_jsonwhat does this regression indicate?
It indicates that, with this change, the throughput dropped around 9% compared to the baseline (master).
I'd suggest running Vector locally to instrument it with a profiler to see where the additional costs are coming from 🤔
Hope this helps! Let me know if you have any additional questions!
Regression Detector Results
Run ID: d65f469e-d34a-4a95-bc8b-7aa01a8e5d89 Baseline: 1b57acd6bb806a09556a8d5cd3a64709c5f44354 Comparison: 7ea71173332e98d6271dfda97e5e8ad6e5c64860 Total CPUs: 7
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
Significant changes in experiment optimization goals
Confidence level: 90.00% Effect size tolerance: |Δ mean %| ≥ 5.00%
| perf | experiment | goal | Δ mean % | Δ mean % CI |
|---|---|---|---|---|
| ✅ | splunk_hec_route_s3 | ingress throughput | +6.39 | [+5.87, +6.91] |
| ❌ | http_text_to_http_json | ingress throughput | -7.10 | [-7.28, -6.93] |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI |
|---|---|---|---|---|
| ✅ | splunk_hec_route_s3 | ingress throughput | +6.39 | [+5.87, +6.91] |
| ➖ | syslog_log2metric_splunk_hec_metrics | ingress throughput | +4.80 | [+4.58, +5.02] |
| ➖ | fluent_elasticsearch | ingress throughput | +1.94 | [+1.44, +2.43] |
| ➖ | datadog_agent_remap_blackhole_acks | ingress throughput | +1.94 | [+1.77, +2.10] |
| ➖ | syslog_humio_logs | ingress throughput | +1.84 | [+1.67, +2.02] |
| ➖ | datadog_agent_remap_blackhole | ingress throughput | +1.53 | [+1.39, +1.67] |
| ➖ | http_elasticsearch | ingress throughput | +1.52 | [+1.42, +1.62] |
| ➖ | http_to_http_acks | ingress throughput | +1.45 | [+0.07, +2.83] |
| ➖ | syslog_log2metric_humio_metrics | ingress throughput | +0.87 | [+0.76, +0.98] |
| ➖ | datadog_agent_remap_datadog_logs_acks | ingress throughput | +0.56 | [+0.43, +0.69] |
| ➖ | http_to_s3 | ingress throughput | +0.30 | [+0.02, +0.58] |
| ➖ | datadog_agent_remap_datadog_logs | ingress throughput | +0.27 | [+0.14, +0.40] |
| ➖ | http_to_http_noack | ingress throughput | +0.21 | [+0.12, +0.30] |
| ➖ | http_to_http_json | ingress throughput | +0.07 | [-0.01, +0.14] |
| ➖ | splunk_hec_to_splunk_hec_logs_acks | ingress throughput | -0.00 | [-0.14, +0.14] |
| ➖ | splunk_hec_indexer_ack_blackhole | ingress throughput | -0.00 | [-0.15, +0.15] |
| ➖ | splunk_hec_to_splunk_hec_logs_noack | ingress throughput | -0.02 | [-0.13, +0.09] |
| ➖ | enterprise_http_to_http | ingress throughput | -0.08 | [-0.16, +0.01] |
| ➖ | file_to_blackhole | egress throughput | -1.01 | [-3.53, +1.50] |
| ➖ | otlp_grpc_to_blackhole | ingress throughput | -1.11 | [-1.21, -1.02] |
| ➖ | otlp_http_to_blackhole | ingress throughput | -1.62 | [-1.76, -1.47] |
| ➖ | syslog_loki | ingress throughput | -1.77 | [-1.88, -1.66] |
| ➖ | syslog_log2metric_tag_cardinality_limit_blackhole | ingress throughput | -1.91 | [-2.08, -1.75] |
| ➖ | socket_to_socket_blackhole | ingress throughput | -2.10 | [-2.19, -2.00] |
| ➖ | syslog_splunk_hec_logs | ingress throughput | -2.37 | [-2.51, -2.23] |
| ➖ | syslog_regex_logs2metric_ddmetrics | ingress throughput | -4.10 | [-4.37, -3.84] |
| ❌ | http_text_to_http_json | ingress throughput | -7.10 | [-7.28, -6.93] |
Explanation
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
Hey @jszwedko,
If that is empty, the host key is not added. I think we would just default to empty and let users opt into it.
So, I am giving this a shot, my assumption is that the cost is added not because of adding a extension to the service, but by performing logical operations while creating connections, but if the operations is as light as a jmp then, the effect should be minimal and after a bit a reconsideration this seems worth a shot. can you run the regression test for this?
/ci-run-regression
Regression Detector Results
Run ID: b995ed0e-9ca4-4460-b3ce-ce7ac56efca2 Baseline: 1b57acd6bb806a09556a8d5cd3a64709c5f44354 Comparison: b6ac53a80cd5eaa17890fc856702b7dfd726dffa Total CPUs: 7
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
No significant changes in experiment optimization goals
Confidence level: 90.00% Effect size tolerance: |Δ mean %| ≥ 5.00%
There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI |
|---|---|---|---|---|
| ➖ | fluent_elasticsearch | ingress throughput | +2.20 | [+1.72, +2.69] |
| ➖ | http_to_http_acks | ingress throughput | +0.87 | [-0.50, +2.23] |
| ➖ | http_text_to_http_json | ingress throughput | +0.84 | [+0.71, +0.97] |
| ➖ | otlp_grpc_to_blackhole | ingress throughput | +0.29 | [+0.20, +0.38] |
| ➖ | datadog_agent_remap_blackhole | ingress throughput | +0.23 | [+0.14, +0.33] |
| ➖ | datadog_agent_remap_datadog_logs | ingress throughput | +0.21 | [+0.11, +0.32] |
| ➖ | http_to_http_noack | ingress throughput | +0.13 | [+0.02, +0.25] |
| ➖ | syslog_regex_logs2metric_ddmetrics | ingress throughput | +0.10 | [+0.00, +0.21] |
| ➖ | http_to_http_json | ingress throughput | +0.07 | [-0.01, +0.15] |
| ➖ | splunk_hec_indexer_ack_blackhole | ingress throughput | +0.01 | [-0.14, +0.15] |
| ➖ | splunk_hec_to_splunk_hec_logs_acks | ingress throughput | +0.00 | [-0.14, +0.14] |
| ➖ | splunk_hec_to_splunk_hec_logs_noack | ingress throughput | -0.03 | [-0.15, +0.09] |
| ➖ | enterprise_http_to_http | ingress throughput | -0.05 | [-0.11, +0.01] |
| ➖ | syslog_humio_logs | ingress throughput | -0.06 | [-0.17, +0.06] |
| ➖ | http_elasticsearch | ingress throughput | -0.06 | [-0.13, +0.02] |
| ➖ | http_to_s3 | ingress throughput | -0.16 | [-0.44, +0.12] |
| ➖ | syslog_splunk_hec_logs | ingress throughput | -0.32 | [-0.37, -0.26] |
| ➖ | datadog_agent_remap_datadog_logs_acks | ingress throughput | -0.34 | [-0.44, -0.25] |
| ➖ | datadog_agent_remap_blackhole_acks | ingress throughput | -0.47 | [-0.56, -0.38] |
| ➖ | splunk_hec_route_s3 | ingress throughput | -0.54 | [-1.00, -0.08] |
| ➖ | syslog_log2metric_splunk_hec_metrics | ingress throughput | -0.57 | [-0.73, -0.41] |
| ➖ | syslog_log2metric_tag_cardinality_limit_blackhole | ingress throughput | -0.68 | [-0.81, -0.55] |
| ➖ | syslog_log2metric_humio_metrics | ingress throughput | -1.16 | [-1.27, -1.05] |
| ➖ | syslog_loki | ingress throughput | -1.24 | [-1.28, -1.19] |
| ➖ | socket_to_socket_blackhole | ingress throughput | -1.26 | [-1.33, -1.19] |
| ➖ | otlp_http_to_blackhole | ingress throughput | -1.88 | [-2.01, -1.76] |
| ➖ | file_to_blackhole | egress throughput | -4.06 | [-6.53, -1.58] |
Explanation
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
/ci-run-regression
Regression Detector Results
Run ID: 20bb5626-be07-4ed9-b42f-ff6453744b15 Baseline: 1b57acd6bb806a09556a8d5cd3a64709c5f44354 Comparison: a97483c6ca2140f484409c1f66eb1a3f86068f0e Total CPUs: 7
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
No significant changes in experiment optimization goals
Confidence level: 90.00% Effect size tolerance: |Δ mean %| ≥ 5.00%
There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI |
|---|---|---|---|---|
| ➖ | datadog_agent_remap_datadog_logs_acks | ingress throughput | +0.55 | [+0.47, +0.63] |
| ➖ | datadog_agent_remap_datadog_logs | ingress throughput | +0.44 | [+0.33, +0.55] |
| ➖ | http_elasticsearch | ingress throughput | +0.37 | [+0.30, +0.45] |
| ➖ | otlp_http_to_blackhole | ingress throughput | +0.37 | [+0.27, +0.48] |
| ➖ | http_to_http_acks | ingress throughput | +0.37 | [-0.98, +1.72] |
| ➖ | file_to_blackhole | egress throughput | +0.36 | [-2.01, +2.73] |
| ➖ | syslog_regex_logs2metric_ddmetrics | ingress throughput | +0.29 | [+0.17, +0.42] |
| ➖ | datadog_agent_remap_blackhole_acks | ingress throughput | +0.15 | [+0.05, +0.25] |
| ➖ | http_to_http_noack | ingress throughput | +0.12 | [+0.02, +0.23] |
| ➖ | http_to_http_json | ingress throughput | +0.02 | [-0.06, +0.09] |
| ➖ | splunk_hec_indexer_ack_blackhole | ingress throughput | -0.00 | [-0.15, +0.14] |
| ➖ | splunk_hec_to_splunk_hec_logs_acks | ingress throughput | -0.00 | [-0.14, +0.14] |
| ➖ | splunk_hec_to_splunk_hec_logs_noack | ingress throughput | -0.07 | [-0.18, +0.04] |
| ➖ | enterprise_http_to_http | ingress throughput | -0.10 | [-0.16, -0.04] |
| ➖ | http_to_s3 | ingress throughput | -0.13 | [-0.41, +0.14] |
| ➖ | syslog_log2metric_splunk_hec_metrics | ingress throughput | -0.25 | [-0.39, -0.11] |
| ➖ | syslog_loki | ingress throughput | -0.26 | [-0.32, -0.20] |
| ➖ | datadog_agent_remap_blackhole | ingress throughput | -0.37 | [-0.47, -0.26] |
| ➖ | socket_to_socket_blackhole | ingress throughput | -0.61 | [-0.67, -0.55] |
| ➖ | otlp_grpc_to_blackhole | ingress throughput | -0.61 | [-0.71, -0.52] |
| ➖ | syslog_splunk_hec_logs | ingress throughput | -0.72 | [-0.81, -0.64] |
| ➖ | syslog_log2metric_tag_cardinality_limit_blackhole | ingress throughput | -0.84 | [-0.94, -0.73] |
| ➖ | fluent_elasticsearch | ingress throughput | -0.87 | [-1.34, -0.39] |
| ➖ | http_text_to_http_json | ingress throughput | -1.10 | [-1.21, -0.99] |
| ➖ | syslog_log2metric_humio_metrics | ingress throughput | -2.40 | [-2.51, -2.28] |
| ➖ | syslog_humio_logs | ingress throughput | -2.48 | [-2.59, -2.37] |
| ➖ | splunk_hec_route_s3 | ingress throughput | -2.50 | [-2.97, -2.04] |
Explanation
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
Seems like a test is failing:
---- sources::http_server::tests::output_schema_definition_legacy_namespace stdout ----
thread 'sources::http_server::tests::output_schema_definition_legacy_namespace' panicked at src/sources/http_server.rs:1555:9:
assertion failed: `(left == right)`'
left: `"Some(Definition { event_kind: Kind { bytes: None, integer: None, float: None, boolean: None, timestamp: None, regex: None, null: None, undefined: None, array: None, object: Some(Collection { known: {F..."` (truncated)
right: `"Some(Definition { event_kind: Kind { bytes: None, integer: None, float: None, boolean: None, timestamp: None, regex: None, null: None, undefined: None, array: None, object: Some(Collection { known: {F..."` (truncated)
Differences (-left|+right):
Collection {
known: {
Field(
KeyString(
+ "host",
+ ),
+ ): Kind {
+ bytes: Some(
+ (),
+ ),
+ integer: None,
+ float: None,
+ boolean: None,
+ timestamp: None,
+ regex: None,
+ null: None,
+ undefined: None,
+ array: None,
+ object: None,
+ },
+ Field(
+ KeyString(
"message",
),
): Kind {
bytes: Some(
I just verified both the side of the assert statement, as the failure suggests the left side doesn't contain the information about the host key. But, I just checked the schema_definition function and it does seem to have the definition about host. Did I miss something, somewhere else?