trafficstars

See issue: https://github.com/vectordotdev/vector/issues/18876

Initial attempt on this, but seeking some feedback to wrap up.

How can we deal with the change in protocol, as structured metadata is part of a newer protocol and this PR overrides the current proto for Loki.

More precisely, see the first two commits of this PR for the proto changes: https://github.com/vectordotdev/vector/pull/20576/files/20285e7544fe57312f4d170dfda9e2e6b5672e31

How extensive do we need tests to be, do we want ({ labels, no labels, dynamic labels} x { structured metadata, no structured metadata, dynamic structured metadata}) or is the addition of tests that only cover the new code sufficient?

May 29 '24 14:05 maxboone

All committers have signed the CLA.

May 29 '24 14:05 bits-bot

As far as the question around changes to the Protocol Buffers definition -- and I haven't look at the changes yet -- I think it should be fine to use them as-is, so long as the changes as all backwards-compatible.

The addition of new fields to existing message types should be fine, as older servers should just ignore them, as far as I understand. I'm not sure how most gRPC server implementations handle requests for unknown gRPC services, though, if we're dealing with new RPC calls.

May 31 '24 14:05 tobz

Hi, is there any update in this PR? I'm waiting for this feature

Jul 10 '24 04:07 kimxogus

Hi, is there any update in this PR? I'm waiting for this feature

My bad, I had a lot of other work to attend to. Will pick this up Thursday.

Jul 10 '24 06:07 maxboone

Hi there, it's great to see the progress on this PR for adding support for structured metadata in the Loki sink! The addition of structured metadata is a significant enhancement that will greatly benefit users.

I understand there are some concerns regarding the changes in protocol and the extent of testing required. While the changes in Protocol Buffers definitions seem to be safe and backwards-compatible, it is crucial to ensure thorough testing to cover various scenarios like labels, structured metadata, and dynamic structured metadata.

Addressing the feedback provided by tobz regarding documentation and code cleanup will further improve the overall quality of the implementation. Making the necessary adjustments in the code and documentation as suggested will help in achieving a more cohesive and maintainable solution.

Given the importance of this feature and the work put into it, I strongly encourage the team to review and merge this PR as soon as possible. This feature is eagerly awaited by users and will undoubtedly enhance the capabilities of the Loki sink in Vector. Let's work together to finalize the remaining tasks and get this valuable enhancement merged promptly. Thank you for your hard work on this PR!

Jul 22 '24 04:07 devkoriel

@tobz thanks a ton for the feedback, I think I processed everything but this is a first real rust contribution for me so apologies for it being messy here and there.

I kept the implementation as similar to the labels as I saw possible but that probably breaks any DRY criteria as it's basically the same code done over. Is it preferred if I refactor these into a single function e.g. convertTemplatedMaptoVec and call that from the respective functions?

Jul 22 '24 07:07 maxboone

@tobz I'll pick up the things that turn up from the CI tomorrow morning, if there's anything else lmk!

With regards to some of the errors, does it help if I do a back merge / rebase?

Aug 01 '24 17:08 maxboone

@tobz any news here, please lmk if I can do anything!

Aug 15 '24 14:08 maxboone

@maxboone Apologies. I swore I had previously approved this, but it seems not!

LGTM overall, but it still needs a changelog entry. You can see the README for the changelog directory -- https://github.com/vectordotdev/vector/tree/master/changelog.d -- which explains what is required.

Aug 15 '24 14:08 tobz

LGTM overall, but it still needs a changelog entry.

@tobz Will do, thanks a ton for the review!

Aug 15 '24 14:08 maxboone

@tobz added the changelog entry, should be good to go :)

Aug 16 '24 05:08 maxboone

@pront @tobz I see some CI failing, is there something I can change / update (or extend the tests and run integration tests on my end)?

I haven't tested it against a local instance of Loki recently, if I need to set that up to ensure it works properly lmk!

Aug 16 '24 18:08 maxboone

@pront @tobz I see some CI failing, is there something I can change / update (or extend the tests and run integration tests on my end)?

I haven't tested it against a local instance of Loki recently, if I need to set that up to ensure it works properly lmk!

It looks like the integration tests are failing to compile. You should be able to get the same errors locally by running make check-clippy. You can also run the integration tests locally via make test-integration-loki.

Aug 16 '24 18:08 jszwedko

It looks like the integration tests are failing to compile. You should be able to get the same errors locally by running make check-clippy. You can also run the integration tests locally via make test-integration-loki.

Will do, will ping again if something pops up :)

Aug 16 '24 19:08 maxboone

I see some messages popping up from running clippy and the integration tests here. I really thought I had fixed those but reality seems different. I'll fix this in the morning (CEST) and poke again.

It's mostly just the protobuf body that has changed and thus the unit test failing.

Aug 16 '24 21:08 maxboone

I see some messages popping up from running clippy and the integration tests here. I really thought I had fixed those but reality seems different. I'll fix this in the morning (CEST) and poke again.

It's mostly just the protobuf body that has changed and thus the unit test failing.

np! Happy to review again afterwards.

Aug 16 '24 22:08 jszwedko

@jszwedko integration tests ran correctly with the latest changes, mainly added #[serde(default)] to structured_metadata in the LokiConfig, as the tests were failing on only the configuration and I personally would want to avoid adding a breaking change to existing configurations.

Anyways, as far as I can replicate, the CI should succeed now!

Aug 17 '24 06:08 maxboone

Great, thanks @maxboone ! Let me trigger CI here.

Aug 19 '24 22:08 jszwedko

👍

Aug 28 '24 06:08 kimxogus

@maxboone I think it just needs a docs regeneration using make generate-component-docs so that the tests pass

Aug 28 '24 13:08 miadabrin

@maxboone I think it just needs a docs regeneration using make generate-component-docs so that the tests pass

good call! will do in a minute

Aug 28 '24 14:08 maxboone

@maxboone Could you run make generate-component-docs?

Sep 02 '24 02:09 kimxogus

I was in the area so I regenerated them.

Sep 03 '24 19:09 jszwedko

good call! will do in a minute

Yeah, unfortunately WSL2 decided to drop out that afternoon after an non-downgradeable update. But it's working again - thanks @jszwedko for generating the docs <3

Sep 03 '24 19:09 maxboone

Regression Detector Results

Run ID: 0fe18c1f-c08e-4ab0-850a-53ffad0cbaad Metrics dashboard

Baseline: 06ad674ff4fbf2df82c50b787c45316c5b5c8d50 Comparison: d174d55fadaaee8e8a55223f43a6000e2814382d

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

No significant changes in experiment optimization goals

Confidence level: 90.00% Effect size tolerance: |Δ mean %| ≥ 5.00%

There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf	experiment	goal	Δ mean %	Δ mean % CI	links
➖	file_to_blackhole	egress throughput	+1.09	[-5.81, +7.98]

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI
➖	socket_to_socket_blackhole	ingress throughput	+3.15	[+3.07, +3.24]
➖	otlp_http_to_blackhole	ingress throughput	+2.21	[+2.09, +2.33]
➖	syslog_log2metric_tag_cardinality_limit_blackhole	ingress throughput	+1.97	[+1.88, +2.06]
➖	syslog_loki	ingress throughput	+1.16	[+1.08, +1.24]
➖	datadog_agent_remap_blackhole_acks	ingress throughput	+1.11	[+0.99, +1.23]
➖	file_to_blackhole	egress throughput	+1.09	[-5.81, +7.98]
➖	http_elasticsearch	ingress throughput	+0.94	[+0.76, +1.12]
➖	syslog_regex_logs2metric_ddmetrics	ingress throughput	+0.93	[+0.77, +1.09]
➖	fluent_elasticsearch	ingress throughput	+0.86	[+0.37, +1.36]
➖	datadog_agent_remap_blackhole	ingress throughput	+0.62	[+0.53, +0.70]
➖	syslog_log2metric_humio_metrics	ingress throughput	+0.60	[+0.46, +0.73]
➖	syslog_humio_logs	ingress throughput	+0.34	[+0.23, +0.45]
➖	syslog_log2metric_splunk_hec_metrics	ingress throughput	+0.22	[+0.12, +0.31]
➖	otlp_grpc_to_blackhole	ingress throughput	+0.21	[+0.10, +0.32]
➖	http_to_http_noack	ingress throughput	+0.19	[+0.10, +0.29]
➖	http_to_http_json	ingress throughput	+0.03	[-0.01, +0.07]
➖	splunk_hec_to_splunk_hec_logs_acks	ingress throughput	+0.02	[-0.08, +0.12]
➖	splunk_hec_to_splunk_hec_logs_noack	ingress throughput	+0.01	[-0.08, +0.11]
➖	splunk_hec_indexer_ack_blackhole	ingress throughput	+0.01	[-0.07, +0.08]
➖	datadog_agent_remap_datadog_logs_acks	ingress throughput	-0.04	[-0.20, +0.12]
➖	http_to_s3	ingress throughput	-0.08	[-0.36, +0.19]
➖	datadog_agent_remap_datadog_logs	ingress throughput	-0.49	[-0.71, -0.28]
➖	http_to_http_acks	ingress throughput	-0.68	[-1.95, +0.58]
➖	splunk_hec_route_s3	ingress throughput	-1.12	[-1.42, -0.81]
➖	http_text_to_http_json	ingress throughput	-1.25	[-1.37, -1.14]
➖	syslog_splunk_hec_logs	ingress throughput	-2.42	[-2.57, -2.27]

Explanation

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

Sep 03 '24 21:09 github-actions[bot]

vector
vector copied to clipboard

feat(loki sink): add support for structured metadata

Regression Detector Results

No significant changes in experiment optimization goals

Experiments ignored for regressions

Fine details of change detection per experiment

Explanation

vector vector copied to clipboard

feat(loki sink): add support for structured metadata

Regression Detector Results

No significant changes in experiment optimization goals

Experiments ignored for regressions

Fine details of change detection per experiment

Explanation

vector
vector copied to clipboard