next.js icon indicating copy to clipboard operation
next.js copied to clipboard

[NEXT-911] Issue tracking OpenTelemetry support

Open jankaifer opened this issue 2 years ago • 17 comments

A place to track current OTel support for easier cooperation with other projects.

From SyncLinear.com | NEXT-911

jankaifer avatar Mar 29 '23 14:03 jankaifer

~~PR where we added docs and simple examples: https://github.com/vercel/next.js/pull/47194#issuecomment-1488706353~~

EDIT: It's merged now and available in our docs.

jankaifer avatar Mar 29 '23 14:03 jankaifer

Node SDK for OTel has somewhat stabilized. We can move to it later, but there are a few issues with SDK and our Collector setup. https://github.com/vercel/next.js/pull/47194#issuecomment-1487493353

jankaifer avatar Mar 29 '23 14:03 jankaifer

Our instrumentation doesn't work well with BasicTraceProvider and you need to use NodeTraceProvider.

It won't be able to track active context and it won't be able to connect current spans to it's parent span.

jankaifer avatar Mar 30 '23 09:03 jankaifer

Our instrumentation doesn't work well with BasicTraceProvider and you need to use NodeTraceProvider.

It won't be able to track active context and it won't be able to connect current spans to it's parent span.

Is this saying basic tracer provider doesn't track context but node tracer provider does? The context manager is separate from the tracer provider and both should be useable if you manually specify the appropriate context manager

dyladan avatar Apr 06 '23 17:04 dyladan

I switched to NodeSDK in docs. It didn't work because it used BatchedSpanProcessor by default, and we were not flushing spans at the end of the request. After explicitly using SimpleSpanProcessor, it works :+1:

jankaifer avatar Apr 11 '23 14:04 jankaifer

Would this be the right place to track issues relating to propagation? I used the docs to write a custom the registerOtel() function like so;

import {
  AlwaysOffSampler,
  CompositePropagator,
  ParentBasedSampler,
  TraceIdRatioBasedSampler,
} from '@opentelemetry/core';
import { B3InjectEncoding, B3Propagator } from '@opentelemetry/propagator-b3';
import {
  BatchSpanProcessor,
  NodeTracerProvider,
} from '@opentelemetry/sdk-trace-node';
import { DiagConsoleLogger, DiagLogLevel, diag } from '@opentelemetry/api';

import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { Resource } from '@opentelemetry/resources';
import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions';

export const registerOTel = (serviceName: string) => {
  // For troubleshooting, set the log level to DiagLogLevel.DEBUG
  diag.setLogger(new DiagConsoleLogger(), DiagLogLevel.DEBUG);
  const provider = new NodeTracerProvider({
    sampler: new ParentBasedSampler({
      root: new TraceIdRatioBasedSampler(0.1),
    }),
    resource: new Resource({
      [SemanticResourceAttributes.SERVICE_NAME]: serviceName,
    }),
  });

  const exporter = new OTLPTraceExporter({
    url: 'http://opentelemetry-collector.monitoring:4318/v1/traces',
  });

  // XXX: This is the pain point so far.
  provider.register({
    propagator: new CompositePropagator({
      propagators: [
        new B3Propagator(),
        new B3Propagator({ injectEncoding: B3InjectEncoding.MULTI_HEADER }),
      ],
    }),
  });

  provider.addSpanProcessor(
    new BatchSpanProcessor(exporter, {
      // The maximum queue size. After the size is reached spans are dropped.
      maxQueueSize: 100,
      // The maximum batch size of every export. It must be smaller or equal to maxQueueSize.
      maxExportBatchSize: 10,
      // The interval between two consecutive exports
      scheduledDelayMillis: 500,
      // How long the export can run before it is cancelled
      exportTimeoutMillis: 30000,
    })
  );

  return provider;
};

So far traces are getting exported properly to opentelemetry-collector, but they aren't getting the parentTraceId that's being set by the load balancer.

jamesloosli avatar Apr 13 '23 18:04 jamesloosli

This issue was meant for chatting with OpenTelemetry people. For bug reports, please make a new issue.

I've made one for you: #48384 cc: @jamesloosli

jankaifer avatar Apr 14 '23 11:04 jankaifer

how stable is opentelemetry support?

please provide a way in the documentation where we can follow up. telemetry aspects are essential for commercial use applications.

devmanbr avatar Jul 11 '23 11:07 devmanbr

NextJS doesn't seem to extract incoming headers traceparent and tracestate.

Multiply avatar Aug 09 '23 11:08 Multiply

Hey, how do I pass in the traceparent header to the patched fetch within next.js?

Ankcorn avatar Oct 18 '23 14:10 Ankcorn

This might be fixed with https://github.com/vercel/next.js/pull/57084 but I haven't tested it yet.

Multiply avatar Oct 30 '23 20:10 Multiply

I am having some issues getting the spans to be associated with the parent trace (https://github.com/vercel/next.js/discussions/57911). Is this a known issue already?

tachang avatar Nov 03 '23 01:11 tachang

Hello everyone, I would like to help people who come to this issue searching for how we can propagate the trace in the patched fetch, like @Ankcorn.

After a while, I found a workaround until we have a definitive solution:

fetch('<URL>', {
  headers: {
    get traceparent() {
      const traceId = '';
      const spanId = '';
      return `00-${traceId}-${spanId}-01`;
    }
  },
})

Notice that the traceparent is a getter, so it will be read while the nextjs patched fetch clones the headers. Also, it's important to not retrieve the traceId and spanId outside the getter since NextJS creates a separate span for the fetch internally.

I created these functions to retrieve the traceId and spanId:

import type { SpanContext } from '@opentelemetry/api';
import { isSpanContextValid, trace, context } from '@opentelemetry/api';

const getSpanContext = (): SpanContext | undefined => {
  const span = trace.getSpan(context.active());
  if (!span) {
    return undefined;
  }
  const spanContext = span.spanContext();
  if (!isSpanContextValid(spanContext)) {
    return undefined;
  }
  return spanContext;
};

export const getTraceId = (): string | undefined => {
  const spanContext = getSpanContext();
  return spanContext?.traceId;
};

export const getSpanId = (): string | undefined => {
  const spanContext = getSpanContext();
  return spanContext?.spanId;
};

export const getTraceparentHeader = () => {
  const spanContext = getSpanContext();
  if (!spanContext) return '';

  return `00-${spanContext.traceId}-${spanContext.spanId}-01`;
};

I hope this helps you somehow 😄

gilmarsquinelato avatar Nov 06 '23 22:11 gilmarsquinelato

I don't know if it's on my setup or others are seeing the same, but in my development environment it is logging empty lines instead of page compilation.

  ▲ Next.js 14.0.1
  - Local:        http://localhost:3001
  - Environments: .env
  - Experiments (use at your own risk):
    · instrumentationHook

○ Compiling /instrumentation ...
⚠ 


✓ Ready in 5.1s

Update: After digging a bit I found the file that's causing this, if you simply import @opentelemetry/instrumentation you already have this issue happening. And then I tried to track which file exactly is, and it's @opentelemetry/instrumentation/build/src/platform/node/instrumentation, also if you only import it the bug happens, don't need to do anything else.

Since the file does a bunch of imports I tried to manually import them separately, and they aren't causing the issue. But two functions inside the InstrumentationBase class are, _warnOnPreloadedModules and _extractPackageVersion.

Another thing that I realized, it looks like a compilation issue, because I made a conditional require of this module, where even being a falsey condition it would be included in the transpiled files. After this, even the code not being in fact executed but only being required inside an if, the logs were buggy.

So to summarize: instrumentation.ts

export const register = () => {
  const isProduction = process.env.NODE_ENV === 'production'; // intentionally set as a variable instead of an inline check in the if

  if (process.env.NEXT_RUNTIME === 'nodejs' && isProduction) {
    require('@opentelemetry/instrumentation/build/src/platform/node/instrumentation');
  }
};

gilmarsquinelato avatar Nov 07 '23 13:11 gilmarsquinelato

Hello, any plan to improve OpenTelemetry integration in 2024? Seems like there still some issues when trace not fully propagated and part of this issues that you can't properly add header to the call's that being made to other microservices without breaking de-duplication. Strange that there no possibility to override globally generation of hash for de-duplication and for example exclude some headers from being a part of it.

Any possibility to resolve this fully without workaround? Something like out-of-the-box? Maybe in Next.js 15? I will be very happy if someone jump in and say: hey you are wrong it's already supported well in Next.js 14. Or at least it will be fixed in Next.js 15.

OpenTelemetry becomes a modern standard de-facto for observebility, strange that it not fully supported here. @jankaifer I would like to see developers of Next.js being proactive here and not fully rely on maintainers of OpenTelemetry.

Thanks!

Hronom avatar Aug 27 '24 01:08 Hronom

Hi, I've left Vercel in May 2023. The best person to ask about Open Telemetry support in Next.js is probably @feedthejim or @timneutkens.

jankaifer avatar Aug 27 '24 07:08 jankaifer

Thanks @jankaifer, sorry for pinging, but I think this is a legacy that you get=).

@feedthejim or @timneutkens tagging you to get attention of someone from Vercel to this topic and comment that I left here https://github.com/vercel/next.js/issues/47660#issuecomment-2311410618

Just in case if it out of your radar, because, as I mention - OpenTelemetry becomes a standard de-facto for observability and unfortunately there still issues when you try to integrate it. More on this, if you have front-end based on Next.js it screws-up tracing in general, so back-end services on other languages that have good support of OpenTelemetry(like java) struggling from Next.js based front-end=).

So please give it some more priority, thank you!

Hronom avatar Aug 27 '24 13:08 Hronom