apm-agent-nodejs icon indicating copy to clipboard operation
apm-agent-nodejs copied to clipboard

Transiting traceparent over custom transports

Open dvisztempacct opened this issue 5 years ago • 22 comments

I'm implementing grpc-node support (specifically the grpc npm package) with the new addPatch agent API.

I had guessed (incorrectly!) that if GRPC wasn't supported for apm-agent-nodejs that it wasn't supported for any Elastic APM agent. I've found, however, that at least apm-agent-go has GRPC support, so I want to make sure that my implementation is fully compatible for distributed tracing.

I have three questions:

  1. What is the officially supported way to obtain the correct APM context in apm-agent-nodejs to transit between services for distributed tracing?

I was looking at some of the existing code (link below) and am weary of using the _context property of Span found therein. Does the leading underscore mean this is not a public interface? If it is a private interface, is there an alternative way to obtain this information?

https://github.com/elastic/apm-agent-nodejs/blob/853c9ddde1ce2d671e3d478e103edeeac5f29539/lib/instrumentation/http-shared.js#L122-L130

  1. Why are S3 hosts blacklisted in the http-shared module?

https://github.com/elastic/apm-agent-nodejs/blob/853c9ddde1ce2d671e3d478e103edeeac5f29539/lib/instrumentation/http-shared.js#L181-L183

  1. When I transit this APM context from GRPC client to GRPC server, what is the correct way to do so?

Before I learned that GRPC was supported by some Elastic APM agents, I was just going to stuff the string representation of the context into some of the call's metadata, but looking more closely at apm-agent-go (disclaimer: I do not know golang) it looks like the GRPC implementation depends on the header name from the HTTP implementation:

https://github.com/elastic/apm-agent-go/blob/master/module/apmgrpc/server.go#L97-L98

Which means that when this value changes in a future version of the agent pending W3C standard freeze, the GRPC metadata key will also be changed.

https://github.com/elastic/apm-agent-go/blob/master/module/apmhttp/traceheaders.go#L31-L36

So, how can I make sure that my implementation for node-grpc is similarly responsive to this change?

Any other advice or pointers would be greatly appreciated!

dvisztempacct avatar Apr 03 '19 21:04 dvisztempacct

There's a PR open right now to officially expose the header. https://github.com/elastic/apm-agent-nodejs/pull/969

As far as I know though, grpc just uses http2 internally, so it should be propagating that header already. I haven't tested the Node.js grpc module though, so it's possible there is context loss before it reaches that point.

Qard avatar Apr 04 '19 18:04 Qard

@Qard

There's a PR open right now to officially expose the header. #969

Awesome :D

As far as I know though, grpc just uses http2 internally, so it should be propagating that header already. I haven't tested the Node.js grpc module though, so it's possible there is context loss before it reaches that point.

Are HTTP/2 request headers the standard way for transiting traceparent id for GRPC? That may not be possible with the grpc npm package.

The grpc npm package provides a binding to a native grpc implementation, and in a year of working with it I haven't seen a leakage that reveals the underlying HTTP/2 guts, and I would be surprised to learn that the native GRPC implementation uses an HTTP/2 implementation that exposes a javascript interface.

Right now I'm using metadata to transit the traceparent ID, but it's worrying to me that this won't be compatible with other agents.

dvisztempacct avatar Apr 04 '19 19:04 dvisztempacct

Ah, yeah. I was only aware of the pure-js implementation, which uses the Node.js core http2 module. It looks like they have a native implementation too though. In that case, yes: you will need a way to inject the headers if you intend for this to be compatible with other instrumented services. I'm sad to say: it may be impossible if the library provides no method of manipulating the headers.

@watson Perhaps we should spend some time investigating this? We could maybe make a PR to the grpc module to expose the headers somehow.

Qard avatar Apr 04 '19 21:04 Qard

@dvisztempacct

Right now I'm using metadata to transit the traceparent ID, but it's worrying to me that this won't be compatible with other agents.

The Go agent is also using metadata. I just did a proof of concept for a Node.js client interceptor, and confirmed that it works with the Go agent's instrumentation: https://gist.github.com/axw/bc5025035fbd2cea6a4516349be8a7ce. (I'm by no means a Node.js developer, sorry if the code is horrible.)

Screenshot from 2019-04-05 11-37-37

The span name "grpc" is just a placeholder, it should be gRPC method name.

axw avatar Apr 05 '19 03:04 axw

@axw That's great news!

I'm afraid though that in your gist I don't understand why you're ending the span on line 14, effectively at the same time that you started it on line 9.

Am I misunderstanding something?

Thanks both :)

dvisztempacct avatar Apr 06 '19 00:04 dvisztempacct

@dvisztempacct no, you're not misunderstanding; I just don't know how to JavaScript properly :wink: Just wanted to show that what you had intended to do will work with the Go agent's instrumentation, so long as you use the metadata key "elastic-apm-traceparent", and the same string representation used in HTTP.

The grpc npm package provides a binding to a native grpc implementation, and in a year of working with it I haven't seen a leakage that reveals the underlying HTTP/2 guts, and I would be surprised to learn that the native GRPC implementation uses an HTTP/2 implementation that exposes a javascript interface.

This is also the case with grpc-go. Metadata is encoded as HTTP headers (https://github.com/grpc/grpc/blob/master/doc/PROTOCOL-HTTP2.md), but that's abstracted.

axw avatar Apr 07 '19 02:04 axw

@axw thanks :)

dvisztempacct avatar Apr 08 '19 19:04 dvisztempacct

FYI: The API for getting the traceparent is now public and has just landed in v2.9.0:

3 new getters have been added from which you can get the traceparent string:

  • agent.currentTraceparent
  • span.traceparent
  • transaction.traceparent

watson avatar Apr 10 '19 20:04 watson

@watson I don't see currentTraceparent in https://www.elastic.co/guide/en/apm/agent/nodejs/current/agent-api.html is it just undocumented?

Do all three return the same traceparent string?

Thanks again :100:

dvisztempacct avatar Apr 10 '19 20:04 dvisztempacct

The new docs are building as we speak... they take about 30 minutes to 1 hour to get online

watson avatar Apr 10 '19 20:04 watson

You can see it in the docs from the master branch: https://www.elastic.co/guide/en/apm/agent/nodejs/master/agent-api.html#apm-current-traceparent

watson avatar Apr 10 '19 20:04 watson

@watson haha I guess you really meant it when you said it just landed ;)

dvisztempacct avatar Apr 10 '19 20:04 dvisztempacct

haha yep 😇

... and yes, they all return the same type of string - the same we automatically add to outgoing http headers

watson avatar Apr 10 '19 20:04 watson

I just noticed that one of the dependencies is probably also waiting on a release.. yarn said [email protected] couldn't be found and prompted me to choose a different version :joy_cat:

dvisztempacct avatar Apr 10 '19 20:04 dvisztempacct

That was released 10 hours ago, so it should be fine 🤔

If you look on npm it's there: https://www.npmjs.com/package/elastic-apm-http-client

watson avatar Apr 10 '19 20:04 watson

I have no experience with yarn, but maybe it has an internal cache that somehow needs to be cleared?

watson avatar Apr 10 '19 20:04 watson

Oh, I see. There's a couple of things that could have interfered, but I just retried and it worked :)

dvisztempacct avatar Apr 10 '19 21:04 dvisztempacct

Whoops, I think this needs to remain open. We talked a bit about getting the traceparent value but not about getting the traceparent key (specifically, the string "Elastic-Apm-Traceparent", which is slated to later become the W3's choice, "Traceparent")

dvisztempacct avatar Apr 10 '19 21:04 dvisztempacct

Related to https://github.com/elastic/apm-agent-nodejs/issues/1015

alvarolobato avatar Apr 23 '19 15:04 alvarolobato

Hi did you manage to get it to work @dvisztempacct ?

I managed to get it to work with the interceptor here is my code with type script

import apm from 'elastic-apm-node/start'

import {TodoServiceClient} from "../../client/service_grpc_pb";
import {credentials} from "@grpc/grpc-js";
import {InterceptingCall} from "grpc";

const grpcAddr = process.env.BACKEND_ADDR || "backend:5000";

const traceInterceptor = function(options:any, nextCall:any) {
    return new InterceptingCall(nextCall(options), {
        start: function(metadata, listener, next) {
            const span = apm.startSpan('grpc');
            const traceParent = apm.currentTraceparent;
            metadata.add('elastic-apm-traceparent', traceParent);
            console.log("metadata:", metadata);
            next(metadata, listener, next);
            span.end();
        }
    });
}

export const client = new TodoServiceClient(
    grpcAddr,
    credentials.createInsecure(),
    {
        interceptors: [traceInterceptor]
    }
);

But I wonder why the service span is outside of the parent tracing

Screenshot 2021-05-11 at 10 22 59

yulrizka avatar May 11 '21 07:05 yulrizka

But I wonder why the service span is outside of the parent tracing

Screenshot 2021-05-11 at 10 22 59

@yulrizka Maybe the span is outside the parent tracing because you ended the span just after calling next, and grpc messaging is not a sync operation, so, the span is closed before the call is really started. Maybe if you close the span in the halfClose event it can work as intended.

Farenheith avatar May 13 '23 10:05 Farenheith

I managed to implement the client side adding the following interceptor to the client:

const spanSymbol = Symbol('span');
export function traceInterceptor(options: any, nextCall: any) {
  return new grpc.InterceptingCall(nextCall(options), {
    start(metadata, listener, next) {
      (this as any)[spanSymbol] = apm.startSpan('grpc');
      const traceParent = apm.currentTraceparent;
      if (traceParent) {
        metadata.add('elastic-apm-traceparent', traceParent);
      }
      next(metadata, listener);
    },
    halfClose(next) {
      (this as any)[spanSymbol]?.end();
      next();
    },
  });
}

On the server side, I used the following code to start the transaction on each grpc method:

export function startApmTransaction(
  call: ServerWriteableStream<unknown, unknown> | ServerUnaryCall<unknown>,
  name: string,
) {
  console.log(call.metadata.getMap());
  const traceParent = (call.metadata.get('elastic-apm-traceparent')[0] ??
    call.metadata.get('traceparent')[0]) as string | undefined;
  const transaction = apm.startTransaction(name, {
    childOf: traceParent,
  });
  return transaction;
}

At the end of the methods excution, of course, I needed to end the transaction. I'm using only unary and server side stream calls

Farenheith avatar May 13 '23 14:05 Farenheith