dd-trace-rb icon indicating copy to clipboard operation
dd-trace-rb copied to clipboard

Fix losing original karafka trace after iterating through the messages with distributed_tracing on

Open Drowze opened this issue 3 months ago • 9 comments

What does this PR do?

On the Karafka integration, when distributed_tracing is on, the original trace was lost after iterating through the messages. This PR aims to fix that ~and also add span links linking the message traces with the parent consumer trace (so it's easier to find each other using the Datadog APM UI)~.

Fixes #4873

Motivation: Fix and improve the Datadog Karafka support.

Change log entry

Additional Notes:

How to test the change? See #4873 for a full testing snippet.

Drowze avatar Sep 01 '25 13:09 Drowze

Hey @marcotc @ivoanjo can you have a quick look here please? 🙇

Would like to get this merged to not conflict with other Karafka/Waterdrop pull requests (e.g.: DataDog/dd-trace-rb#4874 - and also I want to open a new PR soon about turning off distributed tracing on a per-topic basis, see here)

Drowze avatar Sep 15 '25 15:09 Drowze

Err thanks for the patience, I've asked the team to take a look at this asap :)

ivoanjo avatar Sep 16 '25 13:09 ivoanjo

@Drowze I took a first pass and I need to allocate proper time to understand how we best want to present this kind of workflow (span links, directly linking, something else).

I have scheduled time to take a good look at this next week.

marcotc avatar Sep 17 '25 18:09 marcotc

The repair to https://github.com/DataDog/dd-trace-rb/pull/4876 LGTM. Span links do not appear to be used in dd-trace-rb except in one place where an open telemetry trace is converted (mapped?) to a datadog trace. @Drowze how do you feel about moving the span link code into a separate PR and just leaving the fix in this PR?

p-datadog avatar Sep 23 '25 20:09 p-datadog

Regarding span links: I read https://docs.datadoghq.com/tracing/trace_collection/span_links/ to try to understand their usage. The documentation page says the span links are for relationships other than parent/child. In this PR they are used for parent/child relationship? What is the actual gain/benefit?

p-datadog avatar Sep 23 '25 20:09 p-datadog

Looking at this PR again I don't understand the fix. The problem is the parent trace gets reset somehow, therefore I would expect the fix to repair the reset. How is adding links fixing the reset?

p-datadog avatar Sep 23 '25 20:09 p-datadog

Coming back to this now - apologies for the delay 🙇

do you feel about moving the span link code into a separate PR and just leaving the fix in this PR?

Sure - sounds good to me. I understand that span links are not widely used at the moment and using them here just made the PR a bit confusing to review, so I'll move that part into a separate pull request - and we can continue discussing span links there.

@p-datadog I just rebased the PR with latest master and removed the code referring to span links - can you have another look please? 🙇

Drowze avatar Dec 01 '25 17:12 Drowze

Hey Datadog team 👋 @vpellan @p-datadog @marcotc @ivoanjo Is there anything I can do to help moving this PR forward?

As of now we're having to use our own dd-trace-rb fork due to some issues/missing features around the Karafka/WaterDrop integration (our fork is the latest dd-trace-rb release (v2.22.0), plus this PR, #5120 and #5107).

Drowze avatar Dec 09 '25 14:12 Drowze

Hey @Drowze ! Thanks for pinging me, I'll take a look at it asap !

vpellan avatar Dec 12 '25 09:12 vpellan

Hi @Drowze , I have this on my list as well. I am currently working on some time-sensitive fixes but after that is done (hopefully by end of week) I plan on getting this PR merged.

We don't currently have automation around community PRs - this is essentially the blocker.

p-datadog avatar Dec 17 '25 03:12 p-datadog