vector
vector copied to clipboard
fix(file source): Fix checksum calculation
The checksum is now calculated only from the bytes read, and not from the entire buffer. Also added an auto-update procedure from the previous version.
Resolves: #15700
Deploy Preview for vector-project ready!
Name | Link |
---|---|
Latest commit | 9969f9c238ad9ade2e37587bfacde89bd91f0264 |
Latest deploy log | https://app.netlify.com/sites/vector-project/deploys/63be6538697ed0000b8979f8 |
Deploy Preview | https://deploy-preview-15899--vector-project.netlify.app |
Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify site settings.
Deploy Preview for vrl-playground ready!
Name | Link |
---|---|
Latest commit | 9969f9c238ad9ade2e37587bfacde89bd91f0264 |
Latest deploy log | https://app.netlify.com/sites/vrl-playground/deploys/63be65389141400008697e6f |
Deploy Preview | https://deploy-preview-15899--vrl-playground.netlify.app |
Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify site settings.
Regression Test Results
Run ID: 413bc53b-ab3c-4b3e-abbf-a76f889ad3e7
Baseline: 1727e729487c4075c29d1ca30cda5053def52085
Comparison: 9969f9c238ad9ade2e37587bfacde89bd91f0264
Total vector
CPUs: 7
Explanation
A regression test is an integrated performance test for vector
in
a repeatable rig, with varying configuration for vector
. What
follows is a statistical summary of a brief vector
run for each
configuration across SHAs given above. The goal of these tests are to determine,
quickly, if vector
performance is changed and to what degree by a
pull request. Where appropriate units are scaled per-core.
The table below, if present, lists those experiments that have experienced a
statistically significant change in their bytes_written_per_cpu_second
performance
between baseline and comparison SHAs, with 90.0%
confidence OR have been detected as newly erratic. Negative values mean that
baseline is faster, positive comparison. Results that do not exhibit more than a
±5% change in mean bytes_written_per_cpu_second
are
discarded. An experiment is erratic if its coefficient of variation is greater
than 0.1. The abbreviated table will be
omitted if no interesting changes are observed.
Changes in bytes_written_per_cpu_second
with confidence ≥ 90.00% and absolute Δ mean >= ±5%:
experiment | Δ mean | Δ mean % | confidence |
---|---|---|---|
syslog_regex_logs2metric_ddmetrics | 263.55KiB/CPU-s | 7.42 | 100.00% |
Fine details of change detection per experiment.
experiment | Δ mean | Δ mean % | confidence | baseline mean | baseline stdev | baseline stderr | baseline outlier % | baseline CoV | comparison mean | comparison stdev | comparison stderr | comparison outlier % | comparison CoV | erratic | declared erratic |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
syslog_regex_logs2metric_ddmetrics | 263.55KiB/CPU-s | 7.42 | 100.00% | 3.47MiB/CPU-s | 389.66KiB/CPU-s | 5.03KiB/CPU-s | 0.0 | 0.109752 | 3.72MiB/CPU-s | 455.44KiB/CPU-s | 5.88KiB/CPU-s | 0.0 | 0.119415 | True | False |
syslog_log2metric_splunk_hec_metrics | 300.18KiB/CPU-s | 3.19 | 100.00% | 9.19MiB/CPU-s | 256.99KiB/CPU-s | 3.32KiB/CPU-s | 0.0 | 0.0273 | 9.49MiB/CPU-s | 165.4KiB/CPU-s | 2.14KiB/CPU-s | 0.0 | 0.017027 | False | False |
syslog_splunk_hec_logs | 249.73KiB/CPU-s | 2.75 | 100.00% | 8.86MiB/CPU-s | 218.54KiB/CPU-s | 2.82KiB/CPU-s | 0.0 | 0.024082 | 9.11MiB/CPU-s | 196.91KiB/CPU-s | 2.54KiB/CPU-s | 0.0 | 0.021117 | False | False |
socket_to_socket_blackhole | 146.49KiB/CPU-s | 1.05 | 100.00% | 13.63MiB/CPU-s | 226.08KiB/CPU-s | 2.92KiB/CPU-s | 0.0 | 0.016202 | 13.77MiB/CPU-s | 105.81KiB/CPU-s | 1.37KiB/CPU-s | 0.0 | 0.007504 | False | False |
syslog_loki | 77.26KiB/CPU-s | 0.86 | 100.00% | 8.79MiB/CPU-s | 233.0KiB/CPU-s | 3.01KiB/CPU-s | 0.0 | 0.025893 | 8.86MiB/CPU-s | 144.31KiB/CPU-s | 1.86KiB/CPU-s | 0.0 | 0.015901 | False | False |
syslog_humio_logs | 76.29KiB/CPU-s | 0.82 | 100.00% | 9.09MiB/CPU-s | 173.82KiB/CPU-s | 2.24KiB/CPU-s | 0.0 | 0.018673 | 9.16MiB/CPU-s | 191.95KiB/CPU-s | 2.48KiB/CPU-s | 0.0 | 0.020454 | False | False |
splunk_hec_route_s3 | 93.07KiB/CPU-s | 0.77 | 100.00% | 11.78MiB/CPU-s | 544.83KiB/CPU-s | 7.03KiB/CPU-s | 0.0 | 0.045168 | 11.87MiB/CPU-s | 513.67KiB/CPU-s | 6.63KiB/CPU-s | 0.0 | 0.042258 | False | False |
datadog_agent_remap_datadog_logs_acks | 242.15KiB/CPU-s | 0.7 | 100.00% | 33.67MiB/CPU-s | 1.13MiB/CPU-s | 14.87KiB/CPU-s | 0.0 | 0.033426 | 33.91MiB/CPU-s | 1.03MiB/CPU-s | 13.61KiB/CPU-s | 0.0 | 0.030362 | False | False |
http_to_http_json | 52.15KiB/CPU-s | 0.38 | 100.00% | 13.57MiB/CPU-s | 299.12KiB/CPU-s | 3.86KiB/CPU-s | 0.0 | 0.021522 | 13.62MiB/CPU-s | 211.65KiB/CPU-s | 2.73KiB/CPU-s | 0.0 | 0.015172 | False | False |
otlp_grpc_to_blackhole | 3.52KiB/CPU-s | 0.33 | 99.99% | 1.04MiB/CPU-s | 42.8KiB/CPU-s | 565.79B/CPU-s | 0.0 | 0.040296 | 1.04MiB/CPU-s | 53.18KiB/CPU-s | 702.61B/CPU-s | 0.0 | 0.049904 | False | False |
splunk_hec_to_splunk_hec_logs_noack | 8.05KiB/CPU-s | 0.06 | 95.27% | 13.62MiB/CPU-s | 249.92KiB/CPU-s | 3.22KiB/CPU-s | 0.0 | 0.017922 | 13.63MiB/CPU-s | 190.71KiB/CPU-s | 2.46KiB/CPU-s | 0.0 | 0.013668 | False | False |
enterprise_http_to_http | 4.46KiB/CPU-s | 0.03 | 60.93% | 13.61MiB/CPU-s | 311.24KiB/CPU-s | 4.02KiB/CPU-s | 0.0 | 0.022325 | 13.62MiB/CPU-s | 255.56KiB/CPU-s | 3.3KiB/CPU-s | 0.0 | 0.018325 | False | False |
splunk_hec_to_splunk_hec_logs_acks | 1.63KiB/CPU-s | 0.01 | 20.86% | 13.62MiB/CPU-s | 331.58KiB/CPU-s | 4.28KiB/CPU-s | 0.0 | 0.023777 | 13.62MiB/CPU-s | 342.41KiB/CPU-s | 4.42KiB/CPU-s | 0.0 | 0.024551 | False | False |
fluent_elasticsearch | 547.95B/CPU-s | 0.0 | 67.82% | 45.41MiB/CPU-s | 29.99KiB/CPU-s | 392.11B/CPU-s | 0.0 | 0.000645 | 45.41MiB/CPU-s | 29.84KiB/CPU-s | 389.94B/CPU-s | 0.0 | 0.000642 | False | False |
splunk_hec_indexer_ack_blackhole | -1.87KiB/CPU-s | -0.01 | 31.73% | 13.62MiB/CPU-s | 246.25KiB/CPU-s | 3.18KiB/CPU-s | 0.0 | 0.017658 | 13.62MiB/CPU-s | 255.16KiB/CPU-s | 3.29KiB/CPU-s | 0.0 | 0.018299 | False | False |
file_to_blackhole | -16.37KiB/CPU-s | -0.03 | 54.34% | 54.49MiB/CPU-s | 1.08MiB/CPU-s | 14.28KiB/CPU-s | 0.0 | 0.019842 | 54.48MiB/CPU-s | 1.27MiB/CPU-s | 16.72KiB/CPU-s | 0.0 | 0.02326 | False | False |
http_to_http_noack | -6.24KiB/CPU-s | -0.04 | 68.35% | 13.61MiB/CPU-s | 307.37KiB/CPU-s | 3.97KiB/CPU-s | 0.0 | 0.022047 | 13.61MiB/CPU-s | 372.1KiB/CPU-s | 4.8KiB/CPU-s | 0.0 | 0.026702 | False | False |
http_to_http_acks | -34.44KiB/CPU-s | -0.63 | 49.14% | 5.31MiB/CPU-s | 2.81MiB/CPU-s | 37.07KiB/CPU-s | 0.0 | 0.527922 | 5.28MiB/CPU-s | 2.77MiB/CPU-s | 36.6KiB/CPU-s | 0.0 | 0.524621 | True | False |
otlp_http_to_blackhole | -12.94KiB/CPU-s | -0.82 | 100.00% | 1.55MiB/CPU-s | 106.46KiB/CPU-s | 1.37KiB/CPU-s | 0.0 | 0.067179 | 1.53MiB/CPU-s | 117.85KiB/CPU-s | 1.52KiB/CPU-s | 0.0 | 0.074974 | False | False |
http_text_to_http_json | -271.19KiB/CPU-s | -1.04 | 100.00% | 25.54MiB/CPU-s | 638.5KiB/CPU-s | 8.24KiB/CPU-s | 0.0 | 0.02441 | 25.28MiB/CPU-s | 562.64KiB/CPU-s | 7.26KiB/CPU-s | 0.0 | 0.021735 | False | False |
datadog_agent_remap_datadog_logs | -426.22KiB/CPU-s | -1.23 | 100.00% | 33.94MiB/CPU-s | 1.39MiB/CPU-s | 18.37KiB/CPU-s | 0.0 | 0.040968 | 33.52MiB/CPU-s | 1.45MiB/CPU-s | 19.2KiB/CPU-s | 0.0 | 0.043362 | False | False |
datadog_agent_remap_blackhole | -636.49KiB/CPU-s | -2.0 | 100.00% | 31.02MiB/CPU-s | 1.0MiB/CPU-s | 13.24KiB/CPU-s | 0.0 | 0.032282 | 30.4MiB/CPU-s | 1.4MiB/CPU-s | 18.5KiB/CPU-s | 0.0 | 0.046042 | False | False |
syslog_log2metric_humio_metrics | -165.0KiB/CPU-s | -2.63 | 100.00% | 6.13MiB/CPU-s | 171.1KiB/CPU-s | 2.21KiB/CPU-s | 0.0 | 0.027258 | 5.97MiB/CPU-s | 265.52KiB/CPU-s | 3.43KiB/CPU-s | 0.0 | 0.043443 | False | False |
datadog_agent_remap_blackhole_acks | -1.14MiB/CPU-s | -3.59 | 100.00% | 31.77MiB/CPU-s | 519.56KiB/CPU-s | 6.71KiB/CPU-s | 0.0 | 0.015968 | 30.63MiB/CPU-s | 762.63KiB/CPU-s | 9.85KiB/CPU-s | 0.0 | 0.024311 | False | False |
Thanks for the contribution @Ilmarii - it struck us that this could also be a good opportunity to switch to crc32fast
to improve performance here as well (while we're doing the migration of what we're checksumming).
Is that something you're interested in tackling, and if not do you mind if I/we push some commits to this PR to introduce that as well?
Hi! Yes, I think I can integrate crc32fast
here.
Hi! Yes, I think I can integrate
crc32fast
here.
Awesome, thanks so much! Please let us know if you need a hand or need to hand over the PR for us to finish.
Hi! I looked at the crate and found that it only supports CRC32, i.e. the result is 32-bit. Since CRC64 is currently used, which has a 64-bit result, this replacement will increase the number of collisions. @spencergilbert Please tell me if you are aware and it's ok
@Ilmarii Thinking about it a little more, given what you brought up.... I think we can leave things as-is for now. Looking closer at all of this code, I realized we only use the checksum/fingerprint for identifying a file... instead of identifying it by path + fingerprint.
If we also included the filepath, I think using CRC32 would be totally fine, but without it... it definitely doesn't feel great to make the change.
I appreciate you pointing out that fact. We'll review the code as-is.
👍 thanks y'all - I'll try give this a review before the end of day today, Tuesday at the latest.