opentelemetry-java icon indicating copy to clipboard operation
opentelemetry-java copied to clipboard

MDC Context lost its associated to individual thread

Open tadlakha9 opened this issue 7 months ago • 10 comments

Describe the bug We have a use case where we are setting the baggage in scope during the webfilter but in the flow the value being lost

Steps to reproduce we have attached a poc, which contains 2 endpoint, you can just bootrun the application and hit the api localhost:8089/api/quizzes/5d20cccab90a07e7190b2b9f. when we hit the api we get the below in logs, If i explain well, we have created the baggage for username with value "sub" then later we override the value after calling the api with "sub-service1" and we expect the "sub-service1" to be available when we are throwing the IllegalArgumentException

025-05-28T09:53:33.335+05:30 "DEBUG[otel-tracing,7c400339c6954d64fa1da33cc40da8ec,6c5c7f76e35841ea,sub]" 10988 --- [otel-tracing] [ctor-http-nio-3] [7c400339c6954d64fa1da33cc40da8ec-6c5c7f76e35841ea] s.w.r.r.m.a.RequestMappingHandlerMapping : [bc350e2f-1] Mapped to com.example.otel_tracing.resource.QuestionResource#findById(ObjectId)
2025-05-28T09:53:33.545+05:30 "DEBUG[otel-tracing,7c400339c6954d64fa1da33cc40da8ec,6c5c7f76e35841ea,sub]" 10988 --- [otel-tracing] [ctor-http-nio-3] [7c400339c6954d64fa1da33cc40da8ec-6c5c7f76e35841ea] o.s.d.m.core.ReactiveMongoTemplate       : findOne using query: { "id" : { "$oid" : "5d20cccab90a07e7190b2b9f"}} fields: Document{{}} for class: class com.example.otel_tracing.model.entity.Question in collection: question
2025-05-28T09:53:33.581+05:30 " INFO[otel-tracing,7c400339c6954d64fa1da33cc40da8ec,6c5c7f76e35841ea,sub]" 10988 --- [otel-tracing] [ctor-http-nio-3] [7c400339c6954d64fa1da33cc40da8ec-6c5c7f76e35841ea] c.e.o.resource.QuestionResource          : Finding question
2025-05-28T09:53:33.587+05:30 " INFO[otel-tracing,7c400339c6954d64fa1da33cc40da8ec,6c5c7f76e35841ea,sub]" 10988 --- [otel-tracing] [ctor-http-nio-3] [7c400339c6954d64fa1da33cc40da8ec-6c5c7f76e35841ea] c.e.o.service.QuestionService            : finding question with id: 5d20cccab90a07e7190b2b9f
2025-05-28T09:53:33.703+05:30 "DEBUG[otel-tracing,7c400339c6954d64fa1da33cc40da8ec,6c5c7f76e35841ea,sub-service1]" 10988 --- [otel-tracing] [ctor-http-nio-3] [7c400339c6954d64fa1da33cc40da8ec-6c5c7f76e35841ea] org.mongodb.driver.cluster               : Server selection started for operation with ID 7. Selector: ReadPreferenceServerSelector{readPreference=primary}, topology description: {type=REPLICA_SET, servers=[{address=localhost:27017, type=REPLICA_SET_PRIMARY, roundTripTime=131.2 ms, state=CONNECTED}]
2025-05-28T09:53:33.719+05:30 "DEBUG[otel-tracing,7c400339c6954d64fa1da33cc40da8ec,6c5c7f76e35841ea,sub-service1]" 10988 --- [otel-tracing] [ctor-http-nio-3] [7c400339c6954d64fa1da33cc40da8ec-6c5c7f76e35841ea] org.mongodb.driver.cluster               : Server selection succeeded for operation with ID 7. Selected server: localhost:27017. Selector: ReadPreferenceServerSelector{readPreference=primary}, topology description: {type=REPLICA_SET, servers=[{address=localhost:27017, type=REPLICA_SET_PRIMARY, roundTripTime=131.2 ms, state=CONNECTED}]
2025-05-28T09:53:33.730+05:30 "DEBUG[otel-tracing,7c400339c6954d64fa1da33cc40da8ec,6c5c7f76e35841ea,sub-service1]" 10988 --- [otel-tracing] [ctor-http-nio-3] [7c400339c6954d64fa1da33cc40da8ec-6c5c7f76e35841ea] org.mongodb.driver.connection            : Checkout started for connection to localhost:27017
2025-05-28T09:53:33.742+05:30 "DEBUG[otel-tracing,7c400339c6954d64fa1da33cc40da8ec,6c5c7f76e35841ea,sub-service1]" 10988 --- [otel-tracing] [ctor-http-nio-3] [7c400339c6954d64fa1da33cc40da8ec-6c5c7f76e35841ea] i.o.context.ThreadLocalContextStorage    :  Trying to close scope which does not represent current context. Ignoring the call.
2025-05-28T09:53:34.021+05:30 "DEBUG[otel-tracing,7c400339c6954d64fa1da33cc40da8ec,6c5c7f76e35841ea,]" 10988 --- [otel-tracing] [ntLoopGroup-3-3] [7c400339c6954d64fa1da33cc40da8ec-6c5c7f76e35841ea] a.w.r.e.AbstractErrorWebExceptionHandler : [bc350e2f-1] Resolved [IllegalArgumentException: Question with id 5d20cccab90a07e7190b2b9f not found] for HTTP GET /api/quizzes/5d20cccab90a07e7190b2b9f
2025-05-28T09:53:34.022+05:30 "ERROR[otel-tracing,7c400339c6954d64fa1da33cc40da8ec,6c5c7f76e35841ea,]" 10988 --- [otel-tracing] [ntLoopGroup-3-3] [7c400339c6954d64fa1da33cc40da8ec-6c5c7f76e35841ea] a.w.r.e.AbstractErrorWebExceptionHandler : [bc350e2f-1]  500 Server Error for HTTP GET "/api/quizzes/5d20cccab90a07e7190b2b9f"

What did you expect to see? Previously we were using brave implementation just now we have migrated to otel implementation, with brave we have the following logs username value sub-service1 is not lost.

2025-05-28T09:44:07.924+05:30 "DEBUG[otel-tracing,68368d8fc5f4102aed3a583a71288585,ed3a583a71288585,sub]" 16920 --- [otel-tracing] [ctor-http-nio-3] [68368d8fc5f4102aed3a583a71288585-ed3a583a71288585] s.w.r.r.m.a.RequestMappingHandlerMapping : [6973f811-1] Mapped to com.example.otel_tracing.resource.QuestionResource#findById(ObjectId)
2025-05-28T09:44:08.268+05:30 "DEBUG[otel-tracing,68368d8fc5f4102aed3a583a71288585,ed3a583a71288585,sub]" 16920 --- [otel-tracing] [ctor-http-nio-3] [68368d8fc5f4102aed3a583a71288585-ed3a583a71288585] o.s.d.m.core.ReactiveMongoTemplate       : findOne using query: { "id" : { "$oid" : "5d20cccab90a07e7190b2b9f"}} fields: Document{{}} for class: class com.example.otel_tracing.model.entity.Question in collection: question
2025-05-28T09:44:08.341+05:30 " INFO[otel-tracing,68368d8fc5f4102aed3a583a71288585,ed3a583a71288585,sub]" 16920 --- [otel-tracing] [ctor-http-nio-3] [68368d8fc5f4102aed3a583a71288585-ed3a583a71288585] c.e.o.resource.QuestionResource          : Finding question
2025-05-28T09:44:08.349+05:30 " INFO[otel-tracing,68368d8fc5f4102aed3a583a71288585,ed3a583a71288585,sub]" 16920 --- [otel-tracing] [ctor-http-nio-3] [68368d8fc5f4102aed3a583a71288585-ed3a583a71288585] c.e.o.service.QuestionService            : finding question with id: 5d20cccab90a07e7190b2b9f
2025-05-28T09:44:08.511+05:30 "DEBUG[otel-tracing,68368d8fc5f4102aed3a583a71288585,ed3a583a71288585,sub-service1]" 16920 --- [otel-tracing] [ctor-http-nio-3] [68368d8fc5f4102aed3a583a71288585-ed3a583a71288585] org.mongodb.driver.cluster               : Server selection started for operation with ID 5. Selector: ReadPreferenceServerSelector{readPreference=primary}, topology description: {type=REPLICA_SET, servers=[{address=localhost:27017, type=REPLICA_SET_PRIMARY, roundTripTime=155.1 ms, state=CONNECTED}]
2025-05-28T09:44:08.533+05:30 "DEBUG[otel-tracing,68368d8fc5f4102aed3a583a71288585,ed3a583a71288585,sub-service1]" 16920 --- [otel-tracing] [ctor-http-nio-3] [68368d8fc5f4102aed3a583a71288585-ed3a583a71288585] org.mongodb.driver.cluster               : Server selection succeeded for operation with ID 5. Selected server: localhost:27017. Selector: ReadPreferenceServerSelector{readPreference=primary}, topology description: {type=REPLICA_SET, servers=[{address=localhost:27017, type=REPLICA_SET_PRIMARY, roundTripTime=155.1 ms, state=CONNECTED}]
2025-05-28T09:44:08.550+05:30 "DEBUG[otel-tracing,68368d8fc5f4102aed3a583a71288585,ed3a583a71288585,sub-service1]" 16920 --- [otel-tracing] [ctor-http-nio-3] [68368d8fc5f4102aed3a583a71288585-ed3a583a71288585] org.mongodb.driver.connection            : Checkout started for connection to localhost:27017
2025-05-28T09:44:08.884+05:30 "DEBUG[otel-tracing,68368d8fc5f4102aed3a583a71288585,ed3a583a71288585,sub-service1]" 16920 --- [otel-tracing] [ntLoopGroup-3-3] [68368d8fc5f4102aed3a583a71288585-ed3a583a71288585] a.w.r.e.AbstractErrorWebExceptionHandler : [6973f811-1] Resolved [IllegalArgumentException: Question with id 5d20cccab90a07e7190b2b9f not found] for HTTP GET /api/quizzes/5d20cccab90a07e7190b2b9f
2025-05-28T09:44:08.887+05:30 "ERROR[otel-tracing,68368d8fc5f4102aed3a583a71288585,ed3a583a71288585,sub-service1]" 16920 --- [otel-tracing] [ntLoopGroup-3-3] [68368d8fc5f4102aed3a583a71288585-ed3a583a71288585] a.w.r.e.AbstractErrorWebExceptionHandler : [6973f811-1]  500 Server Error for HTTP GET "/api/quizzes/5d20cccab90a07e7190b2b9f"

What did you see instead? A clear and concise description of what you saw instead.

What version and what artifacts are you using? Artifacts:

Image

Version: (e.g., v0.4.0, 1eb551b, etc) How did you reference these artifacts? (excerpt from your build.gradle, pom.xml, etc)

Environment Compiler: oracle openjdk 17.0.12" OS: Windows 11 Runtime (if different from JDK above): (e.g., "Oracle JRE 8u251") OS (if different from OS compiled on): (e.g., "Windows Server 2019")

Additional context In the logs we often see Trying to close scope which does not represent current context. Ignoring the call. Some modification done in the past to close the scope https://github.com/open-telemetry/opentelemetry-java/issues/5055

tadlakha9 avatar May 29 '25 05:05 tadlakha9

Have you investigated to figure out what's up with this log line? It indicates that a context isn't properly being closed....

2025-05-28T09:53:33.742+05:30 "DEBUG[otel-tracing,7c400339c6954d64fa1da33cc40da8ec,6c5c7f76e35841ea,sub-service1]" 10988 --- [otel-tracing] [ctor-http-nio-3] [7c400339c6954d64fa1da33cc40da8ec-6c5c7f76e35841ea] i.o.context.ThreadLocalContextStorage    :  Trying to close scope which does not represent current context. Ignoring the call.

jkwatson avatar May 30 '25 02:05 jkwatson

what we have analyzed is that initially we have created a baggage object using OtelBaggageManager as below

OtelBaggageManager(OtelCurrentTraceContext(), emptyList(), emptyList()).getBaggage(TENANT_ID_MDC_KEY)

and then using makeCurrent method we set the value in the current traceContext as below

 fun enhance(traceContext: TraceContext) {
            USERNAME_FIELD_MDC.makeCurrent(traceContext, "sub")
            TENANT_ID_FIELD_MDC.makeCurrent(traceContext, "tenantId")
        }

So as per the javadoc, this makeCurrent is responsible of setting the current baggage in scope with given value. Somehow the context is not preserved and it is trying to close the current context

tadlakha9 avatar Jun 02 '25 05:06 tadlakha9

Let us know if anything else being expected from us to provide for a better analysis on this subject.

tadlakha9 avatar Jun 03 '25 10:06 tadlakha9

Until you've got your instrumentation in a place where it's not leaking contexts, or trying to close contexts on the wrong thread, I'm not sure there's much we can do about it.

jkwatson avatar Jun 05 '25 16:06 jkwatson

As i mentioned above the same piece of code which is working with brave implementation library is not working with opentelemetry library. About the other point regarding closing the context, so we are not explicitly closing the scope, we are using makeCurrent() to set the value in the context and this method is itself responsible of closing the scope.

tadlakha9 avatar Jun 09 '25 05:06 tadlakha9

we are using makeCurrent() to set the value in the context and this method is itself responsible of closing the scope

makeCurrent() isn't responsible for closing the scope

maybe if you can provide a minimal repro we could better help you

trask avatar Jun 09 '25 20:06 trask

Here is the reproducer. If we just replace the otel dependency with brave one then its all works fine.

tadlakha9 avatar Jun 10 '25 05:06 tadlakha9

That is a very large "reproducer". OTel and Brave are not interchangeable. The context APIs are different, and behave differently. Are you expecting them to be interchangeable and work the same way?

jkwatson avatar Jun 10 '25 23:06 jkwatson

Used the same code with brave and otel implementation, currently it is compatible with otel to make it compatible with brave, below changes are done.

  • replaced implementation("io.micrometer:micrometer-tracing-bridge-otel") with implementation("io.micrometer:micrometer-tracing-bridge-brave")
  • Inside TracingConstants class enable the BraveBaggageManager and comment out OtelBaggageManager as below.
Image

and we got the logs difference as provided in the description

tadlakha9 avatar Jun 11 '25 03:06 tadlakha9

I think you buried the lede a bit. You're using micrometer tracing. You'll need to log an issue with that project. It's not something that we support here. I suspect there's a bug in their baggage implementation, but please log a bug with them for this if you think it's not working as expected.

jkwatson avatar Jun 11 '25 14:06 jkwatson