dd-trace-go icon indicating copy to clipboard operation
dd-trace-go copied to clipboard

fix(datastreams): Log Kafka UNKNOWN_SERVER_ERROR

Open robcarlan-datadog opened this issue 6 months ago • 5 comments

What does this PR do?

Logs UNKNOWN_SERVER_ERROR errors from the kafka broker. We assume that this is likely because we are adding message headers when they are unsupported, causing the application to drop messages.

Mostly manual testing.

Motivation

Data Streams Monitoring relies on header injection as a way of propagating context through Kafka messages. Not all Kafka brokers support this, and we have a rudimentary check that librdkafka is above a specific version to allow header injection. Headers are also unsupported if log.message.format.version is set to a pre-0.11 value, but there is no way to check this setting from a client.

The failure mode here is that messages fail to send because even with a recent librdkafka & broker version, we still use a message protocol before message headers are supported. The broker will return an UNKNOWN_SERVER_ERROR.

Reviewer's Checklist

  • [ ] Changed code has unit tests for its functionality at or near 100% coverage.
  • [ ] System-Tests covering this feature have been added and enabled with the va.b.c-dev version tag.
  • [ ] There is a benchmark for any new code, or changes to existing code.
  • [ ] If this interacts with the agent in a new way, a system test has been added.
  • [ ] New code is free of linting errors. You can check this by running golangci-lint run locally.
  • [ ] Add an appropriate team label so this PR gets put in the right place for the release notes.
  • [ ] Non-trivial go.mod changes, e.g. adding new modules, are reviewed by @DataDog/dd-trace-go-guild.

robcarlan-datadog avatar May 08 '25 15:05 robcarlan-datadog

Benchmarks

Benchmark execution time: 2025-05-20 19:35:48

Comparing candidate commit c8542d7f4eefb2d59834aebcbfce44c81325a50e in PR branch rob.carlan/fix-kafka-headers-unsupported-message-format-version with baseline commit ce94bc239a0bba8269fa69673049722c1b4c4b6f in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 54 metrics, 2 unstable metrics.

pr-commenter[bot] avatar May 08 '25 16:05 pr-commenter[bot]

/merge

robcarlan-datadog avatar May 23 '25 16:05 robcarlan-datadog

View all feedbacks in Devflow UI.

2025-05-23 16:13:59 UTC :information_source: Start processing command /merge


2025-05-23 16:14:07 UTC :information_source: MergeQueue: queue is disabled

Added to the queue but the mergequeue is not enabled for now.


2025-06-09 12:28:19 UTC :warning: MergeQueue: This merge request was unqueued

[email protected] unqueued this merge request

dd-devflow[bot] avatar May 23 '25 16:05 dd-devflow[bot]

/merge -c

darccio avatar Jun 09 '25 12:06 darccio

View all feedbacks in Devflow UI.

2025-06-09 12:28:13 UTC :information_source: Start processing command /merge -c

dd-devflow[bot] avatar Jun 09 '25 12:06 dd-devflow[bot]

@robcarlan-datadog Is this PR still relevant? Thanks!

darccio avatar Aug 26 '25 14:08 darccio

Closing because lack of feedback.

darccio avatar Nov 14 '25 15:11 darccio