Fix losing original karafka trace after iterating through the messages with distributed_tracing on
What does this PR do?
On the Karafka integration, when distributed_tracing is on, the original trace was lost after iterating through the messages. This PR aims to fix that ~and also add span links linking the message traces with the parent consumer trace (so it's easier to find each other using the Datadog APM UI)~.
Fixes #4873
Motivation: Fix and improve the Datadog Karafka support.
Change log entry
Additional Notes:
How to test the change? See #4873 for a full testing snippet.
Hey @marcotc @ivoanjo can you have a quick look here please? 🙇
Would like to get this merged to not conflict with other Karafka/Waterdrop pull requests (e.g.: DataDog/dd-trace-rb#4874 - and also I want to open a new PR soon about turning off distributed tracing on a per-topic basis, see here)
Err thanks for the patience, I've asked the team to take a look at this asap :)
@Drowze I took a first pass and I need to allocate proper time to understand how we best want to present this kind of workflow (span links, directly linking, something else).
I have scheduled time to take a good look at this next week.
The repair to https://github.com/DataDog/dd-trace-rb/pull/4876 LGTM. Span links do not appear to be used in dd-trace-rb except in one place where an open telemetry trace is converted (mapped?) to a datadog trace. @Drowze how do you feel about moving the span link code into a separate PR and just leaving the fix in this PR?
Regarding span links: I read https://docs.datadoghq.com/tracing/trace_collection/span_links/ to try to understand their usage. The documentation page says the span links are for relationships other than parent/child. In this PR they are used for parent/child relationship? What is the actual gain/benefit?
Looking at this PR again I don't understand the fix. The problem is the parent trace gets reset somehow, therefore I would expect the fix to repair the reset. How is adding links fixing the reset?
Coming back to this now - apologies for the delay 🙇
do you feel about moving the span link code into a separate PR and just leaving the fix in this PR?
Sure - sounds good to me. I understand that span links are not widely used at the moment and using them here just made the PR a bit confusing to review, so I'll move that part into a separate pull request - and we can continue discussing span links there.
@p-datadog I just rebased the PR with latest master and removed the code referring to span links - can you have another look please? 🙇
Hey Datadog team 👋 @vpellan @p-datadog @marcotc @ivoanjo Is there anything I can do to help moving this PR forward?
As of now we're having to use our own dd-trace-rb fork due to some issues/missing features around the Karafka/WaterDrop integration (our fork is the latest dd-trace-rb release (v2.22.0), plus this PR, #5120 and #5107).
Hey @Drowze ! Thanks for pinging me, I'll take a look at it asap !
Hi @Drowze , I have this on my list as well. I am currently working on some time-sensitive fixes but after that is done (hopefully by end of week) I plan on getting this PR merged.
We don't currently have automation around community PRs - this is essentially the blocker.