spring-boot icon indicating copy to clipboard operation
spring-boot copied to clipboard

Provide a setting for automatic context propagation with reactor-core

Open chemicL opened this issue 2 years ago • 28 comments

reactor-core 3.5.3 introduced a way to automatically propagate ThreadLocal values registered with the context-propagation library. Combined with the use of micrometer-tracing it can allow users to log anywhere in their reactive chains with tracing information present, without the need to use handle/tap operators.

https://github.com/reactor/reactor-core/pull/3335 adds a static method: Hooks.enableAutomaticContextPropagation() that can be triggered at the start of an application to enable this feature.

The call to this method can be handled by Spring Boot and users would just interact with a Spring Boot property to enable/disable this feature.

chemicL avatar Feb 15 '23 10:02 chemicL

Important note is that the above feature is dependant on updating the dependencies for micrometer and context-propagation as listed in Micrometer docs:

  • Micrometer Context-Propagation 1.0.2
  • Micrometer 1.10.4
  • Micrometer Tracing 1.0.2

chemicL avatar Feb 15 '23 12:02 chemicL

@chemicL If I understand correctly, without this setting context propagation is only managed with the tap/handle operators, or by using contextCapture. Using this hook provided by reactor turns it on for all operators.

I'm wondering if we should create a spring.reactor.context-propagation accepting enum values such as AUTO (default) and MANUAL. This would leave us some room for possible new modes in the future. What do you think?

bclozel avatar Feb 15 '23 13:02 bclozel

Currently, as long as context-propagation library is on the classpath, reactor-core does the following (and there's no way to turn it off unless context-propagation is excluded from the runtime):

  • for tap and handle it restores the ThreadLocals from Reactor Context
  • contextCapture writes all ThreadLocal values to Reactor Context under the same keys for which ThreadLocalAccessors are registered

When Hooks.enableAutomaticContextPropagation() is called, the following changes:

  • all operators upstream from a contextWrite() or contextCapture() will have ThreadLocal values associated with keys present in Reactor Context propagated
  • handle and tap won't do anything special for that as the values are already present

In both scenarios, contextCapture is required to store current ThreadLocals into Reactor Context if any were written to. That won't be necessary e.g. when returning a Flux or Mono to which Spring subscribes where it attaches the Observation before calling an endpoint in case Micrometer is used (and the Observation is not even in current ThreadLocal).

I agree the enum approach is flexible and can let us introduce different mechanisms if we come up with any. For the naming, I'd go with AUTO and LIMITED instead of MANUAL. Intuitively, I'd expect MANUAL to mean direct use of the context-propagation library API to restore ThreadLocals.

chemicL avatar Feb 15 '23 14:02 chemicL

@bclozel Could it make sense to call Hooks.disableAutomaticContextPropagation() on context shutdown and/or when ContextPropagationMode != AUTO? I was wondering if this particular solution could otherwise impact other Spring Context's running in the same JVM, for example during tests.

kzander91 avatar Feb 20 '23 09:02 kzander91

Could it make sense to call Hooks.disableAutomaticContextPropagation() on context shutdown and/or when ContextPropagationMode != AUTO? I was wondering if this particular solution could otherwise impact other Spring Context's running in the same JVM, for example during tests.

As far as I understand, this is a static call, so by definition this applies to the entire JVM. Removing this queue wrapper during context shutdown or when the mode is not set to "AUTO" is just likely to disable the feature for other contexts that are still running and require it.

This feature is additive, the performance cost has been measured and should not impact applications significantly. Unless tests are relying on the fact that ThreadLocal values are not restored within reactor operators, this configuration change should not influence other behaviors.

bclozel avatar Feb 20 '23 10:02 bclozel

Given the Micrometer requirement for the hook to have an effect, I wonder if this should be auto-configured in spring-boot-actuator-autoconfigure rather than spring-boot-autoconfigure. I also wonder if the property prefix should also reflect that it is related to tracing. spring.reactor.tracing.context-propagation perhaps?

wilkinsona avatar Feb 20 '23 18:02 wilkinsona

It looks like we're currently relying on context-propagation being in testCompileClasspath through spring-integration-core. When https://github.com/spring-projects/spring-integration/pull/8556 is merged, that'll no longer be the case and the tests won't compile.

wilkinsona avatar Feb 20 '23 18:02 wilkinsona

I've pushed cae8c14ac776e5538c73e55522fa70679737de4c to work around the compile failure following the merge of the PR in Spring Integration. We should see if we can remove the need for it as part of finishing off this issue.

wilkinsona avatar Feb 21 '23 19:02 wilkinsona

We discussed this today and decided that we need more time to make sure we've got this right. We're going to revert the changes for now and revisit this for a subsequent 3.0.x release. In the meantime, a manual call to Hooks.enableAutomaticContextPropagation() can be made in apps that are affected by the problem described in #33372.

wilkinsona avatar Feb 22 '23 21:02 wilkinsona

The changes have been reverted in 88de3cc.

wilkinsona avatar Feb 22 '23 21:02 wilkinsona

@wilkinsona I am prototyping Hooks.enableAutomaticContextPropagation() and have noticed that integrations such as Spring Data Mongo do not propagate the trace ID - I assume this is because it hands off to a different even loop?

davidmelia avatar Feb 24 '23 11:02 davidmelia

@wilkinsona Darn, was waiting for this to be released to help fix some trace propagation issues I was seeing when using micrometer for internal tracing (I presume this will fix my internal @Observed traces not picking up the external trace id when using spring-reactive). I will keep waiting with bated breath!

rhanton avatar Feb 24 '23 21:02 rhanton

No need to wait. Just add a call to Hooks.enableAutomaticContextPropagation() somewhere in your app.

wilkinsona avatar Feb 24 '23 21:02 wilkinsona

Sweet - I'll try that - thanks @wilkinsona

rhanton avatar Feb 24 '23 21:02 rhanton

Still not able to see the traceid and spanid propagate to calls inside methods like .doOnError() or .onErrorResume(). Were those cases be handled as well?

VinishKumar1 avatar Feb 27 '23 10:02 VinishKumar1

I am prototyping Hooks.enableAutomaticContextPropagation() and have noticed that integrations such as Spring Data Mongo do not propagate the trace ID - I assume this is because it hands off to a different even loop?

@davidmelia this issue should get automatically fixed if we introduce this improvement in reactor-core: https://github.com/reactor/reactor-core/issues/3366. However, for the time being, you can follow any interaction with external sources with a .contextWrite(Function.identity()) immediately after the Flux/Mono obtained from such integrations. This boundary will restore ThreadLocals in your code. For the library code to log with trace-id when using a foreign Publisher (e.g. spring-data-mongodb's use of reactive mongoDB driver), they'd need to do the same. It's unfeasible to do everywhere, so we're considering adding it to reactor-core.

chemicL avatar Feb 27 '23 12:02 chemicL

Still not able to see the traceid and spanid propagate to calls inside methods like .doOnError() or .onErrorResume(). Were those cases be handled as well?

@VinishKumar1 please provide a reproducer as an issue in reactor-core, we'll have a look.

chemicL avatar Feb 27 '23 12:02 chemicL

I am prototyping Hooks.enableAutomaticContextPropagation() and have noticed that integrations such as Spring Data Mongo do not propagate the trace ID - I assume this is because it hands off to a different even loop?

@davidmelia this issue should get automatically fixed if we introduce this improvement in reactor-core: reactor/reactor-core#3366. However, for the time being, you can follow any interaction with external sources with a .contextWrite(Function.identity()) immediately after the Flux/Mono obtained from such integrations. This boundary will restore ThreadLocals in your code. For the library code to log with trace-id when using a foreign Publisher (e.g. spring-data-mongodb's use of reactive mongoDB driver), they'd need to do the same. It's unfeasible to do everywhere, so we're considering adding it to reactor-core.

Your suggestions with a .contextWrite(Function.identity()) works for me, so I wait for the fix reactor/reactor-core#3366. Thank you!

rafallezanko avatar Mar 04 '23 22:03 rafallezanko

After upgrading to 3.0.4 (which has the correct dependency versions), adding context-propagation, and adding the Hooks.enableAutomaticContextPropagation() call into my main method, I still was not seeing any tracing.

I was able to make it work by updating a Spring Security annotation @EnableReactiveMethodSecurity(useAuthorizationManager = false) by setting the useAuthorizationManager to false and then it started working. I guess the new Authorization API (see here) which works with the Reactor Context must be causing some issue. I am hopping that the fix in Reactor Core mentioned above fixes this as well.

Stillglade avatar Mar 16 '23 20:03 Stillglade

Given the Micrometer requirement for the hook to have an effect, I wonder if this should be auto-configured in spring-boot-actuator-autoconfigure rather than spring-boot-autoconfigure. I also wonder if the property prefix should also reflect that it is related to tracing. spring.reactor.tracing.context-propagation perhaps?

While this feature is highly requested by developers using tracing support, this option is really about enabling a context propagation feature in Reactor. The context-propagation project lives in the micrometer organization, but it can be used for its main purpose without involving tracing. For example, this is a mandatory dependency for Spring for GraphQL; it is required for propagating the GraphQL context between asynchronous and reactive processing. Many applications are using the MDC logging feature or ThreadLocals and could use this support outside of tracing.

For these reasons, I'm not in favor of moving this feature to the actuator module or changing its configuration property name. Even if the tracing support has still some limitations, I think that making this feature available (and the default) as soon as possible is required if we want our community to adopt Spring Boot 3.0.

bclozel avatar Mar 16 '23 21:03 bclozel

@EnableReactiveMethodSecurity(useAuthorizationManager = false)

Is there any way to work around this issue?

pgovindan-cue avatar Mar 22 '23 19:03 pgovindan-cue

@pgovindan-cue Please direct questions about Spring Security to the Spring Security team. If you believe you've found a bug, please open an issue.

wilkinsona avatar Mar 23 '23 18:03 wilkinsona

Still not able to see the traceid and spanid propagate to calls inside methods like .doOnError() or .onErrorResume(). Were those cases be handled as well?

@VinishKumar1 we observed this as well in a construct like this:

return webClient.get()
    .uri("https://httpbin.org/status/400")
    .retrieve()
    .onStatus(HttpStatusCode::isError, clientResponse -> Mono.error(new RuntimeException("ERROR")))
    .bodyToMono(Object.class)
    .doOnError(throwable -> LOGGER.error("Error: {}", throwable.getMessage()) /* lacks tracing information in the MDC */)
    .onErrorResume(throwable -> Mono.just(throwable.getMessage()));

and could fix it by not creating a new Mono in the onStatus step:

return webClient.get()
    .uri("https://httpbin.org/status/400")
    .retrieve()
    .onStatus(HttpStatusCode::isError, ClientResponse::createException)
    .bodyToMono(Object.class)
    .doOnError(throwable -> LOGGER.error("Error: {}", throwable.getMessage()) /* includes tracing information in the MDC */)
    .onErrorResume(throwable -> Mono.just(throwable.getMessage()));

Your scenario might be different but I hope that this points you into the right direction.

janekbettinger avatar Apr 17 '23 11:04 janekbettinger

@janekbettinger Have you tried with latest reactor version? https://github.com/reactor/reactor-core/releases/tag/v3.5.5

It includes the following:

Propagate ThreadLocals for non-Reactor upstream sources by @chemicL in https://github.com/reactor/reactor-core/pull/3418

jonathannaguin avatar Apr 18 '23 08:04 jonathannaguin

@janekbettinger Have you tried with latest reactor version? https://github.com/reactor/reactor-core/releases/tag/v3.5.5

Yes, 3.5.5 worked well (i.e. fixed tracing) for the majority of log statements, but not for the scenario described in my previous comment. I have a minimum working example that I could share later today or tomorrow if you're interested.

janekbettinger avatar Apr 18 '23 10:04 janekbettinger

@jonathannaguin you can find the example that shows the behavior here:

  • https://github.com/janekbettinger/webflux-micrometer-tracing

janekbettinger avatar Apr 20 '23 06:04 janekbettinger

@janekbettinger yeah, I am also able to reproduce the problem. And actually, another fix would be to do the following

return clientResponse.bodyToMono(byte[].class).map(i->new RuntimeException("ERROR"));

The above seems to retain the context whereas creating a new Mono does not inside onStatus.

jonathannaguin avatar Apr 20 '23 14:04 jonathannaguin

@janekbettinger @jonathannaguin For me just adding .contextCapture() to the end of the chain worked to propagate the context in all instances of doOnNext, doOneError and implicit (such as in elapsed) & explicit creation of new Flux/Mono. (Using 3.5.5 via Spring Boot 3.0.6). However, I'm calling block on this chain, so can't say for sure if this will work in all cases.

So in conclusion the steps I took:

  • Upgraded to Spring Boot 3.0.6
  • Added spring-boot-starter-actuator dependency (Still not 100% sure why this is needed to trigger start of tracing) along with io.micrometer:micrometer-tracing & io.micrometer:micrometer-tracing-bridge-brave
  • Called Hooks.enableAutomaticContextPropagation();
  • Called .contextCapture() on the reactive chain just before calling .block()

aksh1618 avatar May 01 '23 07:05 aksh1618

@chemicL I've tried with reactor-core 3.5.5 and I see similar issue spring-data-r2dbc where the trace ID is no propagated back. As a result I am forced to add .contextWrite(Function.identity()) after the repository method like

repository.findAll().contextWrite(Function.identity());

I was hoping the issue https://github.com/reactor/reactor-core/issues/3366 would have resolved this but unfortunately not.

Appreciate any help. Thanks

mikhilsanghvi avatar Jun 08 '23 15:06 mikhilsanghvi

@mikhilsanghvi I'm always eager to look into an isolated reproducer. If you prepare one, please feel free to open an issue in reactor-core repository so I can analyse and see what can be done about the problem.

chemicL avatar Jun 20 '23 07:06 chemicL