armeria icon indicating copy to clipboard operation
armeria copied to clipboard

Support micrometer context-propagation

Open chickenchickenlove opened this issue 10 months ago • 9 comments

Motivation:

  • Related Issue : https://github.com/line/armeria/issues/5145
  • Armeria already support context-propagation to maintain RequestContext during executing Reactor code. How it requires maintenance.
  • Reactor integrate micro-meter:context-propagation to do context-propagation during Flux, Mono officially. thus, it would be better to migrate from RequestContextHook to RequestContextPropagationHooks because it can reduce maintenance cost.

Modifications:

  • Add new Hook for Reactor.
  • Add new ThreadLocalAccessor for micro-meter:context-propagation to main RequestContext during executing Reactor code like Mono, Flux.
  • Add new config enableContextPropagation to integrate micro-meter:context-propagation with spring-boot3.

Result:

  • Closes https://github.com/line/armeria/issues/5145
  • If user want to use micrometer:context-propagation to maintain RequestContext during executing Reactor code like Mono, Flux, just call RequestContextPropagationHook.enable().

chickenchickenlove avatar Apr 08 '24 07:04 chickenchickenlove

FYI, flow of micro-meter/context-propagation.

image
  1. If we enable context-propagation, hook for context-propagation will be added Reactor's hooks.
image
  1. if subscribe() are called, hook like captureThreadLocals() are called.
image
  1. and then, ContextSnapshot are created to propagate context. in this time, accessor are used to capture state of ThreadLocals.
image
  1. When Mono or Flux are executed in Scheduler's executor, scope are gotten by calling setThreadLocals(). Scope instance are returned, and it is concrete class of AutoClosable.
image
  1. As you can see, Context-Propagation is restore thread local state at end of try ~ resource.
image
  1. On the other hand, setThreadLocals() be about to restore Threadlocal state from ContextSnapshot before runnable.run().

IMHO, it is very similar to RequestContextHook on armeria.

chickenchickenlove avatar Apr 08 '24 07:04 chickenchickenlove

🔍 Build Scan® (commit: 1e6cd9bb6e3c6d703b87a2ed592b8fd31cccd820)

Job name Status Build Scan®
build-windows-latest-jdk-21 https://ge.armeria.dev/s/gjvoef5prmmle
build-self-hosted-unsafe-jdk-8 https://ge.armeria.dev/s/s4emdzuw7owuy
build-self-hosted-unsafe-jdk-21-snapshot-blockhound https://ge.armeria.dev/s/f2soiblmtmso4
build-self-hosted-unsafe-jdk-17-min-java-17-coverage https://ge.armeria.dev/s/7yczrfhcqrtmk
build-self-hosted-unsafe-jdk-17-min-java-11 https://ge.armeria.dev/s/seg3ttdpzg5bg
build-self-hosted-unsafe-jdk-17-leak https://ge.armeria.dev/s/iagbkvjyvpos4
build-self-hosted-unsafe-jdk-11 https://ge.armeria.dev/s/xwsu3uwjxg6as
build-macos-12-jdk-21 https://ge.armeria.dev/s/64pzf6lau2udo

github-actions[bot] avatar Apr 08 '24 11:04 github-actions[bot]

I investigate that internal of Context-Propagation.

Case 1. publisher.subscribe(...). downstream -> upstream flow

  • When publisher.subscribe(...) are called and publisher is instance of ContextWriteRestoringThreadLocals, it creates ContextSnapshot.Scope.in this sequence, read the Key-Value stored in ReactorContext, then use the ThreadLocalAccessor instance to store the value from ReactorContext into ThreadLocal. Additionally, keep the previously stored values in ThreadLocal as PreviousValues, and use this values to revert the state of ThreadLocal to its previous state when the ContextSnapshot.Scope ends. (FYI, ContextSnapshot.Scope implement AutoClosable).

Case 2. subscriber.onSubscribe(...). upstream -> downstream flow

  • When subscriber.onSubscribe(...) are called and subscriber is instance of ContextWriteRestoringThreadLocal, it creates ContextSnapshot.Scope as well. it means that the values in ReactorContext will be stored to ThreadLocal at the start of subscriber.onSubscribe() and ThreadLocal will be revert to its previous state at the end of the function.

Case 3. subscription.request(...). downstream -> upstream flow

  • When subscription.request(...) are called and subscriptionis instance ofContextWriteRestoringThreadLocal, it creates ContextSnapshot.scope` as well.

ContextWriteRestoringThreadLocals operator are integrated when both [Mono|Flux]#contextCapture() and [Mono|Flux]#contextWrite() are called. It means that Reactor Context are essential to propagate RequestContext to each ThreadLocal during Reactor Operations.

In practice, it works like this:

  1. Reads all values accessible by the KEY of ThreadLocalAccessor in the Reactor Context.
  2. Stores the read values in the Threadlocal by using ThreadLocalAccessor. At this time, the values that previously existed in the Threadllocal are maintained in a Map called PreviousValues.
  3. When the Scope ends, it restores the PreviousValues back to the ThreadLocal.

It means that Reactor Context for writing, ThreadLocals for reading.

chickenchickenlove avatar Apr 10 '24 08:04 chickenchickenlove

@trustin nim, There are also major differences in the test. I would like to inform you about them.

The test utility function addCallbacks() should be change.

private static <T> Mono<T> addCallbacks(Mono<T> mono, ClientRequestContext ctx) {
        return mono.doFirst(() -> assertThat(ctxExists(ctx)).isTrue())
                   .doOnSubscribe(s -> assertThat(ctxExists(ctx)).isTrue())
                   .doOnRequest(l -> assertThat(ctxExists(ctx)).isTrue())
                   .doOnNext(foo -> assertThat(ctxExists(ctx)).isTrue())
                   .doOnSuccess(t -> assertThat(ctxExists(ctx)).isTrue())
                   .doOnEach(s -> assertThat(ctxExists(ctx)).isTrue())
                   .doOnError(t -> assertThat(ctxExists(ctx)).isTrue())
                   .doAfterTerminate(() -> assertThat(ctxExists(ctx)).isTrue())
                   // I added contextWrite(...)
                   .contextWrite(Context.of(RequestContextAccessor.getInstance().key(), ctx));
        // doOnCancel and doFinally do not have context because we cannot add a hook to the cancel.
    }

contextWrite(Context.of(RequestContextAccessor.getInstance().key(), ctx)); are added. As we know, micro-meter:context-propagation require Reactor Context to propagate context during reactor operations. thus, i added this method.

// Before : StepVerifier.create(mono1)
// After : Add initiali Reactor Context  to StepVerifier. 
StepVerifier.create(mono1, initialReactorContext(ctx))
                    .expectSubscriptionMatches(s -> ctxExists(ctx))
                    .expectNextMatches(s -> ctxExists(ctx) && "baz".equals(s))
                    .verifyComplete();

In previous test code, StepVerifier.create(mono1) is enough. however, it is not enough to micro-meter:context-propagation. StepVerifier create DefaultVerifySubscriber to subscribe Flux|Mono and valid result is correct. however, DefaultVerifySubscriber has only empty Reactor Context. it means that .expectSubscriptionMatches(s -> ctxExists(ctx)) should be failed.

micro-meter:context-propagation is used to read the values from the Reactor Context and restore the state of ThreadLocal. however, since the DefaultStepVerifierSubscriber has an Empty Reactor Context, the RequestContext stored in ThreadLocal will become Null.

Thus, initial Reactor Context should be include to StepVerifier as well.

chickenchickenlove avatar Apr 10 '24 08:04 chickenchickenlove

Could you fix the build failures before getting reviews?

trustin avatar Apr 12 '24 05:04 trustin

@minwoox IIRC you had some comments related with request context hooks you wanted @chickenchickenlove to address. Could you leave some comment about that?

Thanks! I left my opinion here: https://github.com/line/armeria/pull/5577#discussion_r1593720102

minwoox avatar May 08 '24 09:05 minwoox

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 74.05%. Comparing base (14c5566) to head (e80117c). Report is 69 commits behind head on main.

:exclamation: Current head e80117c differs from pull request most recent head 1e6cd9b

Please upload reports for the commit 1e6cd9b to get more accurate results.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #5577      +/-   ##
============================================
- Coverage     74.05%   74.05%   -0.01%     
+ Complexity    21253    21242      -11     
============================================
  Files          1850     1850              
  Lines         78600    78540      -60     
  Branches      10032    10020      -12     
============================================
- Hits          58209    58164      -45     
+ Misses        15686    15680       -6     
+ Partials       4705     4696       -9     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar May 08 '24 10:05 codecov[bot]

Thanks trustin nim. Thank you for taking the time to review this. 🙇‍♂️

chickenchickenlove avatar Jun 08 '24 14:06 chickenchickenlove

Hi @ikhoon nim, orry to bother you. When you have time, could you take a look this PR? Thanks in advanced 🙇‍♂️

chickenchickenlove avatar Aug 14 '24 01:08 chickenchickenlove