dd-trace-js icon indicating copy to clipboard operation
dd-trace-js copied to clipboard

Feature: create trace span for kafkajs eachBatch

Open allanpaiste opened this issue 2 years ago • 4 comments

What does this PR do?

Enhances kafkajs instrumentation to cover eachBatch the same way the current instrumentation covers eachMessage just so that both chosen paths would get same kind of span representation show up in APM flamegraphs (and the spans to be searchable).

It also provides a experimental environment flag DD_EXPERIMENTAL_KAFKAJS_PLUGIN_TRACE_FIRST_BATCH_MESSAGE which allows developers to have at least some level of automatic traceId propagation for batch processing where the context from first message of each batch is used as the trace parent.

Motivation

Have both consumption strategies chosen when using kafkajs show up in APM flamegraphs and span search rather than half of the usage cases (imagining a situation where half of the consumptions are done in batches) being skipped where the traces would then start from whatever comes after the consume span (db interaction, http calls) which can leave user confused.

Plugin Checklist

  • [x] Unit tests.

Additional Notes

There was a test added in dd-trace that was using eachBatch in a test that was supposed to indicate that user has not provided any callback at all which leaves me thinking that whoever authored the original instrumentation was at least aware that eachBatch was a thing, but then decided to skip implementing it. Would be good to know why that choice was made.

Am aware that when it comes to traceId propagation - then all kinds of batch processing solutions leave a lot to wish for which makes me wonder: was it left without implementation as the authors expect dd-trace users that decide to go with eachMessage to implement tracing within the batchProcessing themselves?

If that is so, then feel free to decline this PR.

allanpaiste avatar Jul 17 '23 11:07 allanpaiste

Thank you for the comment @rochdev

Would it still be reasonable to keep the instrumentation though where it would then just start a new trace (together with specific operation span) just so that the flamegraph would not start with some lower level operation (like database access) which might leave user wondering what preceeded that database access.

It would serve the purpose of still allowing people to list all the Kafka operations - while for obvious reasons not guaranteeing that there's no breakage of the trace.

Currently due to there not being a good way to retain traces per Kafka message (due to batch processing) we are also sacrificing visibility on the origin of some operations (some kafka operations not producing spans and not starting a trace). So you would not be able to determine which kafka message produced some following operations.

One could also still deal with allowing custom instrumentation by allowing (and documenting) the batch instrumentation (with completely new trace) to be disable if need be.

CC: @jbertran

allanpaiste avatar Aug 29 '23 12:08 allanpaiste

@allanpaiste So basically you would keep the span, but when there are multiple messages it would simply not be connected to the upstream services? Generally speaking that would probably make sense. Could the topic still be used as the resource name, or do batches allow messages from different topics?

rochdev avatar Sep 08 '23 03:09 rochdev

@rochdev

The topic can be still used as resource name, yes. The batch will always hold just one partition of one topic - so that should not be a problem (consumer could be handling multiple partitions of a topic but they'd all end up being passed to eachBatch as separate objects).

Just for reference, throwing some kafkajs typing here as well.

eachBatchPayload:

https://github.com/tulios/kafkajs/blob/ff3b1117f316d527ae170b550bc0f772614338e9/types/index.d.ts#L990-L999

... and the Batch itself:

https://github.com/tulios/kafkajs/blob/ff3b1117f316d527ae170b550bc0f772614338e9/types/index.d.ts#L876-L886

allanpaiste avatar Sep 08 '23 06:09 allanpaiste

Blessed by #4645

tlhunter avatar Aug 30 '24 20:08 tlhunter