moleculer icon indicating copy to clipboard operation
moleculer copied to clipboard

fix: datadog tracer scope

Open ngraef opened this issue 4 years ago • 7 comments

:memo: Description

  • Support dd-trace's AsyncLocalStorage and AsyncResource scope manager implementations (introduced in v0.27.0 and v0.28.0) and clean up the AsyncHooks implementation.
  • Fix activation of moleculer spans in dd-trace, allowing for propagation to external modules. This also allows for log injection to connect logs and traces.
  • Allow moleculer to continue a trace that was started in an external module. For example, this will show the moleculer-web handler as a child span of the node http listener.

:dart: Relevant issues

Resolves #690 Resolves #889

:gem: Type of change

  • [x] Bug fix (non-breaking change which fixes an issue)

:vertical_traffic_light: How Has This Been Tested?

  • [x] Unit and integration tests added
  • [x] Tested an environment with 5 distributed services (each running in separate containers) sending traces to Datadog. The sample screenshots below show cross-service calls, custom spans in an action handler, and modules auto-instrumented by dd-trace (http-client, tcp, dns, fs, redis, postgres).
A simple call through moleculer-web to a remote action that makes some HTTP calls and interacts with a postgres database
Screen Shot 2021-04-18 at 11 39 09 AM
A call that includes some nested remote action calls, one of which calls redis and returns an error
Screen Shot 2021-04-18 at 11 40 06 AM
The same call as the previous example, but with a remote service subscribing to an event it emits
Screen Shot 2021-04-18 at 11 16 12 PM
The initial call to a lazy-loaded module with a bunch of fs reads needed to require the library
Screen Shot 2021-04-18 at 11 40 33 AM
An action handler with some custom spans via ctx.startSpan
Screen Shot 2021-04-18 at 11 41 49 AM

:checkered_flag: Checklist:

  • [x] My code follows the style guidelines of this project
  • [x] I have performed a self-review of my own code
  • [x] I have added tests that prove my fix is effective or that my feature works
  • [x] New and existing unit tests pass locally with my changes
  • [x] I have commented my code, particularly in hard-to-understand areas

ngraef avatar Mar 24 '21 23:03 ngraef

@icebob This is now ready for review when you have some time.

ngraef avatar Apr 19 '21 14:04 ngraef

please merge changes from the master, some test is not up-to-date.

icebob avatar Apr 19 '21 15:04 icebob

please merge changes from the master, some test is not up-to-date.

This branch was already rebased onto master. I've rebased again to pull in the latest typing fixes.

I believe the failing test is either an incorrect implementation in src/async-storage.js or an incorrect assumption in the test.

ngraef avatar Apr 26 '21 05:04 ngraef

It's an old module, it is not used in the library anymore. But somehow this PR affects the async_hooks if the unit test failed, but in the master it doesn't.

icebob avatar Apr 27 '21 19:04 icebob

But somehow this PR affects the async_hooks if the unit test failed, but in the master it doesn't.

I looked into this a bit more. The failure might not occur on a development machine by running npm test because jest will spawn multiple workers by default, and the datadog and async-storage tests might not run in the same process. The failure can be reproduced more easily (depending on test execution order) by setting the jest option --maxWorkers=1. This is the value used by default in the 2-core GitHub Actions runners.

The tests on the master branch don't have promise execution tracking enabled because storage.enable() is never called, so the hook is never enabled. Therefore, the executionAsyncId is 0 in both then callbacks and the shared resource is retrieved.

https://github.com/moleculerjs/moleculer/blob/c25f6477049199c13f43228a482d72a77b24b104/test/integration/async-storage.spec.js#L15-L26

However, this PR adds tests for dd-trace's various scope implementations, which enable promise execution tracking for the process. In that case, the async-storage test fails because the then callbacks have different executionAsyncIds, and context propagation from parent to child async resources doesn't occur because the hook in the AsyncStorage class is not enabled. Enabling it in the test with storage.enable() will still fail because the context propagation implementation is broken.

Regardless, I've pushed a change that cleans up the hooks used in dd-trace and restores the previous passing behavior for the async-storage test.

ngraef avatar May 06 '21 00:05 ngraef

@ngraef I see, I don't forget it, but I'm hesitating because it contains a lot of changes affecting the tracer, context, and the communication protocol. I don't feel it can be merged into the 0.14, instead on the next 0.15. In the new version I will drop the schema-based serializer, and I plan to add headers feature in the Context, to store any middleware-based or user-based information which is transferred to remote nodes without causing breaking change in the protocol.

icebob avatar May 11 '21 17:05 icebob

what do you think about it? https://github.com/DataDog/dd-trace-js/pull/1598

icebob avatar Sep 05 '21 17:09 icebob