dd-trace-java
dd-trace-java copied to clipboard
Kotlin awaiting not traced correctly.
If you await for the deferred returned by coroutine while being in another coroutine your span information is lost and whole span is being recreated as if it was parent span. See example project:
https://github.com/ikovalyov/dd-tracing-coroutines-issue-v2
Application logs:
> Task :run
[dd.trace 2019-12-02 09:23:39:910 +0100] [main] INFO datadog.trace.agent.jmxfetch.JMXFetch - JMXFetch config: null [] [] [] null null {runtime-id=fedd6bdd-2499-451c-bb36-0f662f3e8648, service=unnamed-java-app} statsd:localhost:8125 System.err INFO
[dd.trace 2019-12-02 09:23:40:502 +0100] [main] INFO datadog.trace.agent.ot.DDTraceOTInfo - dd-trace - version: 0.38.0~1cb704e7
[dd.trace 2019-12-02 09:23:40:571 +0100] [main] INFO datadog.trace.agent.ot.DDTracer - New instance: DDTracer-77bb48d5{ serviceName=unnamed-java-app, writer=DDAgentWriter { api=DDApi { tracesUrl=http://localhost:8126/v0.3/traces } }, sampler=datadog.trace.agent.common.sampling.RateByServiceSampler@181d8899, defaultSpanTags={}}
[dd.trace 2019-12-02 09:23:40:600 +0100] [main] INFO datadog.trace.agent.tooling.VersionLogger - dd-trace-ot - version: 0.38.0~1cb704e7
[dd.trace 2019-12-02 09:23:40:600 +0100] [main] INFO datadog.trace.agent.tooling.VersionLogger - dd-trace-api - version: 0.38.0~1cb704e7
[dd.trace 2019-12-02 09:23:40:601 +0100] [main] INFO datadog.trace.agent.tooling.VersionLogger - dd-java-agent - version: 0.38.0~1cb704e7
2019-12-02 09:23:41.840 INFO 3998 [ main] Main$main$1 : main DDSpan [ t_id=2654422379082193946, s_id=5753717729389507729, p_id=0] trace=unnamed-java-app/trace.annotation/Main.main metrics={} tags={component=trace, language=jvm, runtime-id=fedd6bdd-2499-451c-bb36-0f662f3e8648, thread.id=1, thread.name=main}, duration_ns=0
2019-12-02 09:23:41.892 INFO 3998 [atcher-worker-2] Main : DefaultDispatcher-worker-2 DDSpan [ t_id=2654422379082193946, s_id=1179641962218689830, p_id=5753717729389507729] trace=unnamed-java-app/trace.annotation/Main.waitFor metrics={} tags={Iteration=0, component=trace, thread.id=26, thread.name=DefaultDispatcher-worker-2}, duration_ns=0
2019-12-02 09:23:41.893 INFO 3998 [atcher-worker-2] Main : DefaultDispatcher-worker-2 DDSpan [ t_id=2654422379082193946, s_id=1179641962218689830, p_id=5753717729389507729] trace=unnamed-java-app/trace.annotation/Main.waitFor metrics={} tags={Iteration=0, component=trace, thread.id=26, thread.name=DefaultDispatcher-worker-2}, duration_ns=0
2019-12-02 09:23:41.899 INFO 3998 [atcher-worker-3] Main : DefaultDispatcher-worker-3 DDSpan [ t_id=2654422379082193946, s_id=2170501652924201656, p_id=5753717729389507729] trace=unnamed-java-app/trace.annotation/Main.waitFor metrics={} tags={Iteration=1, component=trace, thread.id=27, thread.name=DefaultDispatcher-worker-3}, duration_ns=0
2019-12-02 09:23:41.900 INFO 3998 [atcher-worker-4] Main : DefaultDispatcher-worker-4 DDSpan [ t_id=2654422379082193946, s_id=6175891178351208339, p_id=5753717729389507729] trace=unnamed-java-app/trace.annotation/Main.waitFor metrics={} tags={Iteration=3, component=trace, thread.id=28, thread.name=DefaultDispatcher-worker-4}, duration_ns=0
2019-12-02 09:23:41.899 INFO 3998 [atcher-worker-1] Main : DefaultDispatcher-worker-1 DDSpan [ t_id=2654422379082193946, s_id=8485714618033939065, p_id=5753717729389507729] trace=unnamed-java-app/trace.annotation/Main.waitFor metrics={} tags={Iteration=2, component=trace, thread.id=25, thread.name=DefaultDispatcher-worker-1}, duration_ns=0
2019-12-02 09:23:41.900 INFO 3998 [atcher-worker-3] Main : DefaultDispatcher-worker-3 DDSpan [ t_id=2654422379082193946, s_id=2170501652924201656, p_id=5753717729389507729] trace=unnamed-java-app/trace.annotation/Main.waitFor metrics={} tags={Iteration=1, component=trace, thread.id=27, thread.name=DefaultDispatcher-worker-3}, duration_ns=0
2019-12-02 09:23:41.902 INFO 3998 [atcher-worker-5] Main : DefaultDispatcher-worker-5 DDSpan [ t_id=2654422379082193946, s_id=3463457279142380734, p_id=5753717729389507729] trace=unnamed-java-app/trace.annotation/Main.waitFor metrics={} tags={Iteration=4, component=trace, thread.id=29, thread.name=DefaultDispatcher-worker-5}, duration_ns=0
2019-12-02 09:23:41.903 INFO 3998 [atcher-worker-6] Main : DefaultDispatcher-worker-6 DDSpan [ t_id=2654422379082193946, s_id=4348256562928250323, p_id=5753717729389507729] trace=unnamed-java-app/trace.annotation/Main.waitFor metrics={} tags={Iteration=6, component=trace, thread.id=30, thread.name=DefaultDispatcher-worker-6}, duration_ns=0
2019-12-02 09:23:41.904 INFO 3998 [atcher-worker-7] Main : DefaultDispatcher-worker-7 DDSpan [ t_id=2654422379082193946, s_id=4421324058712740639, p_id=5753717729389507729] trace=unnamed-java-app/trace.annotation/Main.waitFor metrics={} tags={Iteration=5, component=trace, thread.id=31, thread.name=DefaultDispatcher-worker-7}, duration_ns=0
2019-12-02 09:23:41.905 INFO 3998 [atcher-worker-8] Main : DefaultDispatcher-worker-8 DDSpan [ t_id=2654422379082193946, s_id=8537738308186511090, p_id=5753717729389507729] trace=unnamed-java-app/trace.annotation/Main.waitFor metrics={} tags={Iteration=7, component=trace, thread.id=32, thread.name=DefaultDispatcher-worker-8}, duration_ns=0
2019-12-02 09:23:41.906 INFO 3998 [tcher-worker-10] Main : DefaultDispatcher-worker-10 DDSpan [ t_id=2654422379082193946, s_id=1985917427252727825, p_id=5753717729389507729] trace=unnamed-java-app/trace.annotation/Main.waitFor metrics={} tags={Iteration=9, component=trace, thread.id=34, thread.name=DefaultDispatcher-worker-10}, duration_ns=0
2019-12-02 09:23:41.909 INFO 3998 [tcher-worker-13] Main : DefaultDispatcher-worker-13 DDSpan [ t_id=2654422379082193946, s_id=7556649444744964666, p_id=5753717729389507729] trace=unnamed-java-app/trace.annotation/Main.waitFor metrics={} tags={Iteration=8, component=trace, thread.id=37, thread.name=DefaultDispatcher-worker-13}, duration_ns=0
2019-12-02 09:23:42.942 INFO 3998 [atcher-worker-3] Main : DefaultDispatcher-worker-3 DDSpan [ t_id=8401234232036504370, s_id=481145876484775946, p_id=0] trace=unnamed-java-app/trace.annotation/Main.waitFor metrics={} tags={component=trace, language=jvm, runtime-id=fedd6bdd-2499-451c-bb36-0f662f3e8648, thread.id=27, thread.name=DefaultDispatcher-worker-3}, duration_ns=0
[dd.trace 2019-12-02 09:23:43:587 +0100] [dd-trace-writer] WARN datadog.trace.agent.common.writer.DDApi - Error while sending 3 of 6 traces to the DD agent. java.net.ConnectException: Failed to connect to localhost/0:0:0:0:0:0:0:1:8126 (going silent for 5 minutes)
2019-12-02 09:23:43.946 INFO 3998 [atcher-worker-1] Main : DefaultDispatcher-worker-1 DDSpan [ t_id=1567002908161829012, s_id=857742505563499203, p_id=0] trace=unnamed-java-app/trace.annotation/Main.waitFor metrics={} tags={component=trace, language=jvm, runtime-id=fedd6bdd-2499-451c-bb36-0f662f3e8648, thread.id=25, thread.name=DefaultDispatcher-worker-1}, duration_ns=0
2019-12-02 09:23:44.951 INFO 3998 [atcher-worker-7] Main : DefaultDispatcher-worker-7 DDSpan [ t_id=2602817791208366767, s_id=926635854296932000, p_id=0] trace=unnamed-java-app/trace.annotation/Main.waitFor metrics={} tags={component=trace, language=jvm, runtime-id=fedd6bdd-2499-451c-bb36-0f662f3e8648, thread.id=31, thread.name=DefaultDispatcher-worker-7}, duration_ns=0
2019-12-02 09:23:45.953 INFO 3998 [atcher-worker-7] Main : DefaultDispatcher-worker-7 DDSpan [ t_id=7206127938673410068, s_id=6063878620474869192, p_id=0] trace=unnamed-java-app/trace.annotation/Main.waitFor metrics={} tags={component=trace, language=jvm, runtime-id=fedd6bdd-2499-451c-bb36-0f662f3e8648, thread.id=31, thread.name=DefaultDispatcher-worker-7}, duration_ns=0
2019-12-02 09:23:46.957 INFO 3998 [atcher-worker-7] Main : DefaultDispatcher-worker-7 DDSpan [ t_id=4634851895448235703, s_id=3472674439697055107, p_id=0] trace=unnamed-java-app/trace.annotation/Main.waitFor metrics={} tags={component=trace, language=jvm, runtime-id=fedd6bdd-2499-451c-bb36-0f662f3e8648, thread.id=31, thread.name=DefaultDispatcher-worker-7}, duration_ns=0
2019-12-02 09:23:47.963 INFO 3998 [atcher-worker-7] Main : DefaultDispatcher-worker-7 DDSpan [ t_id=3557106430153605668, s_id=8402253980305412920, p_id=0] trace=unnamed-java-app/trace.annotation/Main.waitFor metrics={} tags={component=trace, language=jvm, runtime-id=fedd6bdd-2499-451c-bb36-0f662f3e8648, thread.id=31, thread.name=DefaultDispatcher-worker-7}, duration_ns=0
2019-12-02 09:23:48.967 INFO 3998 [atcher-worker-7] Main : DefaultDispatcher-worker-7 DDSpan [ t_id=6980458062161759504, s_id=665376991271971413, p_id=0] trace=unnamed-java-app/trace.annotation/Main.waitFor metrics={} tags={component=trace, language=jvm, runtime-id=fedd6bdd-2499-451c-bb36-0f662f3e8648, thread.id=31, thread.name=DefaultDispatcher-worker-7}, duration_ns=0
2019-12-02 09:23:49.972 INFO 3998 [atcher-worker-7] Main : DefaultDispatcher-worker-7 DDSpan [ t_id=4854400913329068137, s_id=7033435703755271922, p_id=0] trace=unnamed-java-app/trace.annotation/Main.waitFor metrics={} tags={component=trace, language=jvm, runtime-id=fedd6bdd-2499-451c-bb36-0f662f3e8648, thread.id=31, thread.name=DefaultDispatcher-worker-7}, duration_ns=0
Do you have any workaround for this?
Thanks.
Thanks a ton for the example. I think this has to do with annotating suspend methods with @Trace
. Suspend methods behave sequentially in kotlin but under the hood they are invoked async, almost like passing in a callback. A secondary parameter is added to method signatures and a continuation is passed in so that the coroutine can proceed with the execution of its subsequent statements. This particular use case isn’t being exercised in the existing tests at the moment but I do think its a big gap in Kotlin support.
I’ll need to dig into the logic surrounding method level instrumentation but this example project is really going to help.
Hi @marcoferrer I'm not really familiar with internal dd implementation of the tracing API, but I have few ideas on what can cause this issue.
OpenTracing API is saying that there can be only 1 active span in the thread. I guess DD java agent is using thread local to store this information.
Unfortunately in kotlin there can be multiple coroutines running in the same thread, each of them having it's own active span. Continuation logic is processed within internal kotlin event loop where coroutine state is being checked. Also it's possible that coroutine started in one thread will be finished in another thread depending on inner logic (IO reads for example).
This is known issue and kotlin team already discussed how to achieve ThreadLocal-like functionality inside of the coroutines. The rough idea is that developer creates variable in the coroutine scope which is restored on coroutine context change. Take a look at this discussion: https://github.com/Kotlin/kotlinx.coroutines/issues/119
There is a special CoroutineContext function designedto work with ThreadLocal variables and restoring them: https://github.com/Kotlin/kotlinx.coroutines/blob/master/kotlinx-coroutines-core/jvm/src/ThreadContextElement.kt
Unfortunately using these solutions will require to add dependency on the kotlin coroutine core to your project.
Please let me know what you think.
Thanks.
Hello @ikovalyov , thank you for the example to work on! I have opened a pull request to demonstrate how my company worked around this issue, using manual instrumentation. I hope it can be of some use to you/DD.
> Task :run
[dd.trace 2020-02-02 21:46:19:413 +0000] [main] INFO datadog.trace.agent.jmxfetch.JMXFetch - JMXFetch config: null [] [] [] null null {runtime-id=d0ba682d-6be3-4899-bb58-1a891d737499, service=unnamed-java-app} statsd:localhost:8125
[dd.trace 2020-02-02 21:46:19:975 +0000] [main] INFO datadog.trace.agent.ot.DDTraceOTInfo - dd-trace - version: 0.42.0~f78b34fd
[dd.trace 2020-02-02 21:46:20:033 +0000] [main] INFO datadog.trace.agent.ot.DDTracer - New instance: DDTracer-51ec2856{ serviceName=unnamed-java-app, writer=DDAgentWriter { api=DDApi { tracesUrl=http://localhost:8126/v0.3/traces } }, sampler=datadog.trace.agent.common.sampling.RateByServiceSampler@714f3da4, defaultSpanTags={}}
[dd.trace 2020-02-02 21:46:20:056 +0000] [main] INFO datadog.trace.agent.tooling.VersionLogger - dd-trace-ot - version: 0.42.0~f78b34fd
[dd.trace 2020-02-02 21:46:20:056 +0000] [main] INFO datadog.trace.agent.tooling.VersionLogger - dd-trace-api - version: 0.42.0~f78b34fd
[dd.trace 2020-02-02 21:46:20:056 +0000] [main] INFO datadog.trace.agent.tooling.VersionLogger - dd-java-agent - version: 0.42.0~f78b34fd
21:46:21.322 INFO : === INIT === [ t_id=7035779805910583268, s_id=6979527234783672340, p_id=0] main}
21:46:21.464 INFO : ++++ IN +++: 5; [ t_id=7035779805910583268, s_id=3210259951591078075, p_id=6979527234783672340] DefaultDispatcher-worker-6
21:46:21.464 INFO : ++++ IN +++: 3; [ t_id=7035779805910583268, s_id=7267272080838780582, p_id=6979527234783672340] DefaultDispatcher-worker-4
21:46:21.464 INFO : ++++ IN +++: 1; [ t_id=7035779805910583268, s_id=7022831019533687639, p_id=6979527234783672340] DefaultDispatcher-worker-3
21:46:21.464 INFO : ++++ IN +++: 6; [ t_id=7035779805910583268, s_id=3530788139845839964, p_id=6979527234783672340] DefaultDispatcher-worker-7
21:46:21.464 INFO : ++++ IN +++: 9; [ t_id=7035779805910583268, s_id=7568398546412052017, p_id=6979527234783672340] DefaultDispatcher-worker-10
21:46:21.464 INFO : ++++ IN +++: 2; [ t_id=7035779805910583268, s_id=4861729612813258103, p_id=6979527234783672340] DefaultDispatcher-worker-2
21:46:21.464 INFO : ++++ IN +++: 7; [ t_id=7035779805910583268, s_id=5300620202162984080, p_id=6979527234783672340] DefaultDispatcher-worker-11
21:46:21.464 INFO : ++++ IN +++: 4; [ t_id=7035779805910583268, s_id=9080709432457601673, p_id=6979527234783672340] DefaultDispatcher-worker-5
21:46:21.464 INFO : ++++ IN +++: 8; [ t_id=7035779805910583268, s_id=814191239339242963, p_id=6979527234783672340] DefaultDispatcher-worker-9
21:46:21.465 INFO : --- OUT ---: 1; [ t_id=7035779805910583268, s_id=7022831019533687639, p_id=6979527234783672340] DefaultDispatcher-worker-3
21:46:21.464 INFO : ++++ IN +++: 0; [ t_id=7035779805910583268, s_id=5179139853462982352, p_id=6979527234783672340] DefaultDispatcher-worker-1
21:46:21.465 INFO : --- OUT ---: 0; [ t_id=7035779805910583268, s_id=5179139853462982352, p_id=6979527234783672340] DefaultDispatcher-worker-1
21:46:22.506 INFO : --- OUT ---: 2; [ t_id=7035779805910583268, s_id=4861729612813258103, p_id=6979527234783672340] DefaultDispatcher-worker-3
21:46:23.508 INFO : --- OUT ---: 3; [ t_id=7035779805910583268, s_id=7267272080838780582, p_id=6979527234783672340] DefaultDispatcher-worker-3
21:46:24.511 INFO : --- OUT ---: 4; [ t_id=7035779805910583268, s_id=9080709432457601673, p_id=6979527234783672340] DefaultDispatcher-worker-3
21:46:25.513 INFO : --- OUT ---: 5; [ t_id=7035779805910583268, s_id=3210259951591078075, p_id=6979527234783672340] DefaultDispatcher-worker-3
21:46:26.514 INFO : --- OUT ---: 6; [ t_id=7035779805910583268, s_id=3530788139845839964, p_id=6979527234783672340] DefaultDispatcher-worker-3
21:46:27.519 INFO : --- OUT ---: 7; [ t_id=7035779805910583268, s_id=5300620202162984080, p_id=6979527234783672340] DefaultDispatcher-worker-3
21:46:28.522 INFO : --- OUT ---: 8; [ t_id=7035779805910583268, s_id=814191239339242963, p_id=6979527234783672340] DefaultDispatcher-worker-3
21:46:29.526 INFO : --- OUT ---: 9; [ t_id=7035779805910583268, s_id=7568398546412052017, p_id=6979527234783672340] DefaultDispatcher-worker-3
[dd.trace 2020-02-02 21:46:31:083 +0000] [Thread-3] INFO datadog.trace.agent.common.writer.ddagent.TraceSerializingDisruptor - Flushing any remaining traces.
@MxKaras thanks, I'll try this approach!
Looking back at it, our initial implementation of the ThreadContextElement
was a bit naive. @ikovalyov if it helps -
class SpanContextElement(val span: Span) : ThreadContextElement<Scope> {
override val key = Key
// this is invoked before coroutine is resumed on current thread
override fun updateThreadContext(context: CoroutineContext): Scope {
return GlobalTracer.get().activateSpan(span)
}
// this is invoked after coroutine has suspended on current thread
override fun restoreThreadContext(context: CoroutineContext, oldState: Scope) {
oldState.close()
}
companion object Key : CoroutineContext.Key<SpanContextElement>
}
Hi @MxKaras I approved your MR and added more testing code there. Unfortunately more issues revealed :(
Lets say we have 2 coroutines launched from parent span. They both have their own span connected to the parent span. Both of them have this SpanContextElement in use. Both of them are performing some async operations. For some reason, when first coroutine finishes while second coroutine is awaiting parent span is restored in the second coroutine instead of the correct one
15:15:36.319 INFO : --- OUT ---: 9; DDSpan [ t_id=2011767087483520335, s_id=3155681079097456617, p_id=0] trace=Main/trace.annotation/trace.annotation metrics={} tags={component=trace, language=jvm, resource.name=Main.main, runtime-id=febd6773-cf00-4347-9d3c-1f4ed3433742, thread.id=1, thread.name=main}, duration_ns=0
15:15:36.330 INFO : --- OUT ---: 8; DDSpan [ t_id=2011767087483520335, s_id=2642583294064030532, p_id=3155681079097456617] trace=Main/launch 8/launch 8 metrics={} tags={Iteration=8, thread.id=1, thread.name=main}, duration_ns=0
DDSpan [ t_id=2011767087483520335, s_id=2642583294064030532, p_id=3155681079097456617] trace=Main/launch 8/launch 8 metrics={} tags={Iteration=8, thread.id=1, thread.name=main}, duration_ns=8112597105
--> 15:15:36.331 INFO : --- MIDDLE ---: 9; DDSpan [ t_id=2011767087483520335, s_id=3155681079097456617, p_id=0] trace=Main/trace.annotation/trace.annotation metrics={} tags={component=trace, language=jvm, resource.name=Main.main, runtime-id=febd6773-cf00-4347-9d3c-1f4ed3433742, thread.id=1, thread.name=main}, duration_ns=0
DDSpan [ t_id=2011767087483520335, s_id=1805041717008322454, p_id=3155681079097456617] trace=Main/9/9 metrics={} tags={Iteration=9, thread.id=1, thread.name=main}, duration_ns=9114222553
--> 15:15:37.332 INFO : --- OUT ---: 9; DDSpan [ t_id=2011767087483520335, s_id=3155681079097456617, p_id=0] trace=Main/trace.annotation/trace.annotation metrics={} tags={component=trace, language=jvm, resource.name=Main.main, runtime-id=febd6773-cf00-4347-9d3c-1f4ed3433742, thread.id=1, thread.name=main}, duration_ns=0
DDSpan [ t_id=2011767087483520335, s_id=6106499147502182028, p_id=3155681079097456617] trace=Main/launch 9/launch 9 metrics={} tags={Iteration=9, thread.id=1, thread.name=main}, duration_ns=9109520285
Not sure if you had this issue in your production
Solved it by adding the -Ddd.trace.scope.depth.limit=0
configuration parameter
did you try using a continuation interceptor? https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.coroutines/-continuation-interceptor/
Hi @MxKaras, do you have some working solution for more uptodate datadog agent implementations?
My apologies @tekener - I did not notice my notifications for a long time now! I do not work at the company anymore that I was at when I wrote those messages. But the solution above was good enough for our needs (most of the time... there were a few services where the traces didn't work properly... never found out why). I think maybe that we were never creating nested spans in the way that @ikovalyov was.
Before I left that job I was looking at opentelemetry. There seemed to be some promising extensions for coroutines support, but I never got round to playing with it 🤷♂️ . Sorry I couldn't be more helpful!
I wouldn't recommend using the sample provided here: https://github.com/DataDog/dd-trace-java/issues/1123#issuecomment-615354579 , as that would cause errors/memory leaks.
Here's an example:
import datadog.opentracing.DDTracer.DDTracerBuilder
import io.opentracing.Scope
import io.opentracing.Span
import io.opentracing.util.GlobalTracer
import kotlinx.coroutines.ThreadContextElement
import kotlinx.coroutines.runBlocking
import kotlinx.coroutines.time.delay
import org.junit.jupiter.api.BeforeEach
import org.junit.jupiter.api.Test
import org.slf4j.LoggerFactory
import java.time.Duration
import kotlin.coroutines.CoroutineContext
internal class TracerExtensionsKtTest {
private val logger = LoggerFactory.getLogger("testLogger")
@BeforeEach
internal fun setUp() {
DDTracerBuilder()
.build()
.also(GlobalTracer::registerIfAbsent)
}
class SpanContextElement(val span: Span) : ThreadContextElement<Scope> {
override val key = Key
// this is invoked before coroutine is resumed on current thread
override fun updateThreadContext(context: CoroutineContext): Scope {
return GlobalTracer.get().activateSpan(span)
}
// this is invoked after coroutine has suspended on current thread
override fun restoreThreadContext(context: CoroutineContext, oldState: Scope) {
oldState.close()
}
companion object Key : CoroutineContext.Key<SpanContextElement>
}
@Test
internal fun test() {
val topLevelSpan = GlobalTracer.get().buildSpan("top-level-span").start()
val topLevelScope = GlobalTracer.get().activateSpan(topLevelSpan)
topLevelScope.use {
runBlocking(SpanContextElement(topLevelSpan)) {
val uninterruptedSpan = GlobalTracer.get().buildSpan("uninterrupted-span").start()
val uninterruptedScope = GlobalTracer.get().activateSpan(uninterruptedSpan)
uninterruptedScope.use {
logger.info("Hello from uninterrupted span!")
uninterruptedSpan.finish()
}
val interruptedSpan = GlobalTracer.get().buildSpan("interrupted-span").start()
val interruptedScope = GlobalTracer.get().activateSpan(interruptedSpan)
interruptedScope.use {
logger.info("Hello from interrupted span!")
delay(Duration.ofMillis(100))
interruptedSpan.finish()
}
}
topLevelSpan.finish()
}
}
}
The coroutine will get parked here when we call delay()
, leading to a call to SpanContextElement#restoreThreadContext
while interruptedScope
is still in use (not closed). When the coroutine will continue after the delay SpanContextElement#activateSpan
will be called, creating a new scope. This might cause memory leaks, since the stack to which interruptedScope
belongs might not get cleaned.
As a side effect you will also see the following log messages:
14:50:58.803 [main] DEBUG d.t.c.s.ContinuableScopeManager - Tried to close datadog.trace.core.scopemanager.ContinuableScopeManager$ContinuableScope@37efd131->DDSpan [ t_id=3440430157635651782, s_id=154419290588238380, p_id=0 ] trace=com.intellij.rt.junit.JUnitStarter/top-level-span/top-level-span metrics= tags={language=jvm, runtime-id=1c59fbd5-69df-42bf-a07f-bcfe2d9fb227, thread.id=1, thread.name=main}, duration_ns=0 scope when not on top. Current top: datadog.trace.core.scopemanager.ContinuableScopeManager$ContinuableScope@7e7b159b->DDSpan [ t_id=3440430157635651782, s_id=1361391474773844142, p_id=154419290588238380 ] trace=com.intellij.rt.junit.JUnitStarter/interrupted-span/interrupted-span metrics= tags={thread.id=1, thread.name=main @coroutine#1}, duration_ns=0 | MDC: []
14:50:58.903 [main @coroutine#1] DEBUG d.t.c.s.ContinuableScopeManager - Tried to close datadog.trace.core.scopemanager.ContinuableScopeManager$ContinuableScope@7e7b159b->DDSpan [ t_id=3440430157635651782, s_id=1361391474773844142, p_id=154419290588238380 ] trace=com.intellij.rt.junit.JUnitStarter/interrupted-span/interrupted-span metrics= tags={thread.id=1, thread.name=main @coroutine#1}, duration_ns=102527964 scope when not on top. Current top: datadog.trace.core.scopemanager.ContinuableScopeManager$ContinuableScope@293a5f75->DDSpan [ t_id=3440430157635651782, s_id=154419290588238380, p_id=0 ] trace=com.intellij.rt.junit.JUnitStarter/top-level-span/top-level-span metrics= tags={language=jvm, runtime-id=1c59fbd5-69df-42bf-a07f-bcfe2d9fb227, thread.id=1, thread.name=main}, duration_ns=0 | MDC: []
Also, you will notice counter scope.close.error
getting incremented.
See DD source for more details: https://github.com/DataDog/dd-trace-java/blob/90a648fd19d3c3114e29957b11147332c1d30cb9/dd-trace-core/src/main/java/datadog/trace/core/scopemanager/ContinuableScopeManager.java#L272
@monosoul Can you please share the solution to overcome above limitation?
:robot: This issue has been addressed in the latest release. See full details in the Release Notes.