feathers icon indicating copy to clipboard operation
feathers copied to clipboard

OpenTelemetry Tracing support

Open deskoh opened this issue 4 years ago • 16 comments

OpenTelemetry is a cloud-native observability framework that supports tracing in a vendor-neutral manner.

Created an instrumentation proof-of-concept for tracing Feathers app (currently pinned to 5.0.0-pre.2). (See crow branch for v4).

Instrumentation works in a non-intrusive way by 'monkey-patching' methods in existing packages to inject tracing into the application.

For v5, it relies on patching collectLegacyHooks of @feathersjs/feathers which is not currently not officially exported.

(For v4, it patches processHooks of @feathersjs/commons.)

@daffl Possible to export collectLegacyHooks formally for v5 to allow some official / stable means to support OpenTelemetry Tracing?

It's an alternative to #2043 that offload the tracing visualization to tools like Zipkin or Jaeger.

image

deskoh avatar Apr 11 '21 06:04 deskoh

Hm, instead of patching the internals (e.g. with collectLegacyHooks it wouldn't capture any of the new hooks) I believe it would be possible to provide a function that configures a tracing hook as the first one:

export function openTelemetry () {
  return (app: Application) => {
    app.hooks([
      async (context: HookContext, next: NextFunction) => {
        startTrace();
        await next();
        endTrace();
      }
    ]);
  }
}

Now we can configure it with

app.configure(openTelemetry());

As long as this is registered as one of the first plugins (before any other hooks) it should capture everything (except the event emitting since that will always be the very first one).

daffl avatar Apr 15 '21 15:04 daffl

Yeap. Tested it could work.

Downside is I would need to add the hooks if I wanted trace events when app / service hooks before / after hooks completes.

deskoh avatar Apr 19 '21 10:04 deskoh

I believe this would still work. The hook execution order is

  • App hooks (app.hooks([]))
  • App method hooks (app.hooks({ methodName }))
  • Legacy app hooks (before/after) (app.hooks({ before, after }))
  • Service hooks (service.hooks([]))
  • Service method hooks (service.hooks({ methodName }))
  • Legacy service hooks (before/after) (service.hooks({ before, after }))

So as long as it is registered first, a global app hook should capture everything.

daffl avatar Apr 19 '21 16:04 daffl

Any guidance how this could be done in the example above using a single app.configure() call?

+ Hooks can be added here using `app.hooks([...])` to trace start and end of service call
App hooks (app.hooks([]))
App method hooks (app.hooks({ methodName }))
Legacy app hooks (before/after) (app.hooks({ before, after }))
+ Add Hook 1 here to emit trace event 'calling service before hooks` and `completed service after hooks`
Service hooks (service.hooks([]))
Service method hooks (service.hooks({ methodName }))
Legacy service hooks (before/after) (service.hooks({ before, after }))
+ Add Hook 2 here to emit trace event 'calling service` and `completed service`

deskoh avatar Apr 20 '21 12:04 deskoh

I have a feeling this may be related to https://github.com/feathersjs/feathers/issues/2074 unless you pass a tracer via params to every call.

daffl avatar Apr 20 '21 17:04 daffl

OpenTelemetry API already handle 'context propagation'. Specifically, wrapping calls with context.with will allow access to callee to have access parent trace span.

https://github.com/deskoh/feathers-opentelemetry-demo/blob/7fa39d70e9b52b4cf2fa5efc14844fb81067ee68/src/instrumentation-feathers.ts#L55

The appMiddleware and serviceMiddleware are the Hook 1 and Hook 2 required to be 'injected' in correct order

https://github.com/deskoh/feathers-opentelemetry-demo/blob/7fa39d70e9b52b4cf2fa5efc14844fb81067ee68/src/instrumentation-feathers.ts#L59-L64

https://github.com/deskoh/feathers-opentelemetry-demo/blob/7fa39d70e9b52b4cf2fa5efc14844fb81067ee68/src/instrumentation-feathers.ts#L65-L70

See the section Accessing the current span for more details.

deskoh avatar Apr 21 '21 12:04 deskoh

That context propagation won't work for nested service calls though right?

I wonder if the ideal way for tracing a single service call would be to add an instrumentation option directly to https://github.com/feathersjs/hooks

daffl avatar Apr 21 '21 16:04 daffl

Yes it does! There is a deliberate call to users.find service in messages.find to trigger child trace.

If you run up Zipkin, you would be able to see the correct the parent span ID of users.find (the parent-child relationship also depicted by the indented green bar on the left of the span).

The annotations on the right will show the emitted trace events.

I would think the crux is to insert correct hooks in the hooks array in the correct order, which is not possible upfront, as user might add hooks during runtime. The best location I could think of is to patch collectMiddleware of FeathersHookManager, which is not exported. 🤔

deskoh avatar Apr 22 '21 14:04 deskoh

@deskoh My org uses Datadog, but figured be worth sharing how we wrapped our services using a mixin as some what alternative to hooks.

const ddTrace = require('dd-trace');

module.exports = function (app) {
  if (process.env.NODE_ENV === 'test') return;
  const tracer = ddTrace.init({
    tags: ['app:fx-' + process.env.NODE_ENV],
    env: process.env.NODE_ENV,
    logInjection: true,
  });

  app.mixins.push((service, path) => {
    Object.keys(service.methods).forEach((method) => {
      if (service[method]) service[method] = tracer.wrap(`feathers`, tracer.wrap(`service.${path}`, service[method]));
    });
  });
};

Def not a perfect solution still but does get the flow stacked correctly for us in the stack traces.

jchamb avatar May 25 '21 13:05 jchamb

@jchamb thanks! I saw from Feathers v5 mixins would no longer be supported. Would this still work?

deskoh avatar May 27 '21 11:05 deskoh

@deskoh I haven't played with the pre v5's yet, however looking at the migration guide https://dove.docs.feathersjs.com/guides/migrating.html#uberproto I see an example still calling a mixin. @daffl would have to confirm any plans of that also being removed.

You could use that example to start and stop your spans around the method calls as a starter. Like I said previously this isn't a perfect solution as it would still only trace actual service method calls not including the hooks. Our apps flow is pretty linear so for now this is good enough for us.

Between the new hooks and wrapping service calls, def interested to see what you come up with here!

jchamb avatar May 27 '21 12:05 jchamb

@daffl I tried the service mixin documented here and got the following error when making a REST call.

Error: Can not create context for method find
    at Function.createContext (D:\feathers-opentelemetry-demo\node_modules\@feathersjs\feathers\src\hooks\index.ts:22:11)
    at D:\feathers-opentelemetry-demo\node_modules\@feathersjs\express\src\rest.ts:70:19
    at Generator.next (<anonymous>)
    at D:\feathers-opentelemetry-demo\node_modules\@feathersjs\express\lib\rest.js:8:71
    at new Promise (<anonymous>)
    at __awaiter (D:\feathers-opentelemetry-demo\node_modules\@feathersjs\express\lib\rest.js:4:12)
    at D:\feathers-opentelemetry-demo\node_modules\@feathersjs\express\src\rest.ts:58:52
    at D:\feathers-opentelemetry-demo\node_modules\@feathersjs\express\src\rest.ts:44:29
    at Generator.next (<anonymous>)
    at D:\feathers-opentelemetry-demo\node_modules\@feathersjs\express\lib\rest.js:8:71 {
  code: 500
}

Turns out the app mixin runs last and would override the service methods, hence it wouldn't have a createContext function.

deskoh avatar Jun 20 '21 00:06 deskoh

@deskoh I haven't played with the pre v5's yet, however looking at the migration guide https://dove.docs.feathersjs.com/guides/migrating.html#uberproto I see an example still calling a mixin. @daffl would have to confirm any plans of that also being removed.

You could use that example to start and stop your spans around the method calls as a starter. Like I said previously this isn't a perfect solution as it would still only trace actual service method calls not including the hooks. Our apps flow is pretty linear so for now this is good enough for us.

Between the new hooks and wrapping service calls, def interested to see what you come up with here!

@jchamb Managed to get tracing using mixins working!

https://github.com/deskoh/feathers-opentelemetry-demo/blob/dove-mixins/src/opentelemetry.ts

deskoh avatar Jun 20 '21 02:06 deskoh

Awesome @deskoh thanks for sharing that!

jchamb avatar Jun 22 '21 14:06 jchamb

@deskoh Is there a way to get nested spans to work in feathersjs v4? I noticed you have nested spans working in the v5 example, but not in the v4 example.

dyllandry avatar Oct 28 '23 21:10 dyllandry

@deskoh Is there a way to get nested spans to work in feathersjs v4? I noticed you have nested spans working in the v5 example, but not in the v4 example.

how does your current trace look like?

deskoh avatar Nov 09 '23 01:11 deskoh