opentelemetry-operator icon indicating copy to clipboard operation
opentelemetry-operator copied to clipboard

Issue with NestJs operator autoinstrumentation

Open f3l1x98 opened this issue 1 year ago • 9 comments

Component(s)

operator, instrumentation

Describe the issue you're reporting

I hope this is the correct place for the question and somebody can help me.

I am currently trying to setup tracing for an application which consists of four NestJs applications. The traces are exported to a data-prepper instance which then sends them to OpenSearch. I have gotten the opentelemetry operator and autoinstrumentation to work in the sense that everything deploys, starts and traces are collected/send.

Issue

My issue now is that out of the four applications only two report traces from other instrumentations than the @opentelemetry/instrumentation-fs (like for example @opentelemetry/instrumentation-http). The others only report fs traces (quite a lot of them), even though one of them is the main backend which calls the other applications.

I have also checked this with the logging exporter and got the same result so the issue should not be data-prepper/opensearch.

Used versions:

  • OpenTelemetry Operator: 0.90.0 (and later tried 0.91.0)

Configurations

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: opentelemetry-collector-sidecar-agent
spec:
  mode: sidecar
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
          http:
    exporters:
      otlp/data-prepper:
        endpoint: {{ .Values.dataprepper.host }}
        tls:
          insecure: true
          insecure_skip_verify: true
    service:
      telemetry:
        logs:
          level: "debug"
      pipelines:
        traces:
          receivers: [otlp]
          exporters: [otlp/data-prepper]
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: {{ .Values.instrumentation.nodejs }}
spec:
  propagators:
    - tracecontext
    - baggage
  sampler:
    type: parentbased_traceidratio
    argument: "1"
  nodejs:
apiVersion: v1
kind: Service
metadata:
  name: {{ .Chart.Name }}-service
spec:
  selector:
    tier: {{ .Chart.Name }}

  ports:
    - protocol: TCP
      port: 3001
      name: http
---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ .Chart.Name }}-deployment
spec:
  replicas: {{ .Values.replicaCount }}
  selector:
    matchLabels:
      tier: {{ .Chart.Name }}
  template:
    metadata:
      labels:
        app: backend
        tier: {{ .Chart.Name }}
      annotations:
        sidecar.opentelemetry.io/inject: "{{ .Values.opentelemetry.injectSidecar }}"
        instrumentation.opentelemetry.io/inject-nodejs: "{{ .Values.opentelemetry.autoInstrumentNodejs }}"
    spec:
      containers:
        - name: {{ .Chart.Name }}
          image: "{{ .Values.image.name }}:{{ .Values.image.tag }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          resources:
            requests:
              cpu: "{{ .Values.resources.requests.cpu }}"
              memory: "{{ .Values.resources.requests.memory }}"
            limits:
              cpu: "{{ .Values.resources.limits.cpu }}"
              memory: "{{ .Values.resources.limits.memory }}"

          livenessProbe:
            failureThreshold: 3
            httpGet:
              path: /health
              port: 3001
              scheme: HTTP
            initialDelaySeconds: 60
            periodSeconds: 60
            successThreshold: 1
            timeoutSeconds: 10

          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /health
              port: 3001
              scheme: HTTP
            initialDelaySeconds: 60
            periodSeconds: 60
            successThreshold: 1
            timeoutSeconds: 10

f3l1x98 avatar Jan 12 '24 11:01 f3l1x98

Update

I have now tried the same using the sdk manually and that works as expected (getting http, controller, sql, etc traces).

However I would really like to use the CRD if possible.

f3l1x98 avatar Jan 12 '24 11:01 f3l1x98

Can this be related to #2497?

iblancasa avatar Jan 15 '24 15:01 iblancasa

Can this be related to #2497?

I don't think so, after all I do get some traces from the autoinstrumentation, just not nearly all. To me it looks like the instrumentation is not setup correctly and some parts are initialized after the application thus the instrumentation for (for example http) fails. I have tried the CRD again with the image that has the opentelemetry packages installed but got the same result.

f3l1x98 avatar Jan 16 '24 10:01 f3l1x98

Well I have enabled the debug logs of the Intsrumentation CRD and by the looks of it the instrumentation is mostly correctly initialized (at least http instrumentation succeeds), however the resulting spans are never sent to the otc-collector. part of backend logs.

@opentelemetry/api: Registered a global for diag v1.7.0.
@opentelemetry/api: Registered a global for trace v1.7.0.
@opentelemetry/api: Registered a global for context v1.7.0.
@opentelemetry/api: Registered a global for propagation v1.7.0.
@opentelemetry/api: Registered a global for metrics v1.7.0.
@opentelemetry/instrumentation-fs Applying patch for fs
@opentelemetry/instrumentation-nestjs-core Applying create patch for @nestjs/[email protected]
@opentelemetry/instrumentation-nestjs-core Applying create patch for @nestjs/[email protected]
maxQueueSize reached, dropping spans
@opentelemetry/instrumentation-nestjs-core Patching @nestjs/[email protected]
@opentelemetry/instrumentation-winston Applying patch for [email protected]
@opentelemetry/instrumentation-http Applying patch for [email protected]
@opentelemetry/instrumentation-http Applying patch for [email protected]
@opentelemetry/instrumentation-fs Applying patch for fs/promises
Applying patch for net module
Applying patch for [email protected]
Accessing resource attributes before async attributes settled
{"context":"NestFactory","level":"info","message":"Starting Nest application...","span_id":"3d190c80a1fe565d","timestamp":"2024-01-16T10:45:09.812Z","trace_flags":"01","trace_id":"5c948d80e0d3a83bcd3bb2e2d7e079f4"}

[...]

@opentelemetry/instrumentation-http http instrumentation incomingRequest
wrap lookup callback function and starts span
executing wrapped lookup callback function
executing original lookup callback function
@opentelemetry/instrumentation-http http instrumentation incomingRequest
wrap lookup callback function and starts span
executing wrapped lookup callback function
executing original lookup callback function
items to be sent [
  {
    resource: Resource {
      _attributes: [Object],
      asyncAttributesPending: false,
      _syncAttributes: [Object],
      _asyncAttributesPromise: [Promise]
    },
    scopeMetrics: [ [Object] ]
  }
]
Service request {
  resourceMetrics: [
    { resource: [Object], schemaUrl: undefined, scopeMetrics: [Array] }
  ]
}
{"stack":"Error: PeriodicExportingMetricReader: metrics export failed (error Error: 12 UNIMPLEMENTED: unknown service opentelemetry.proto.collector.metrics.v1.MetricsService)\n    at doExport (/otel-auto-instrumentation-nodejs/node_modules/@opentelemetry/sdk-metrics/build/src/export/PeriodicExportingMetricReader.js:76:23)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async PeriodicExportingMetricReader._doRun (/otel-auto-instrumentation-nodejs/node_modules/@opentelemetry/sdk-metrics/build/src/export/PeriodicExportingMetricReader.js:84:13)\n    at async PeriodicExportingMetricReader._runOnce (/otel-auto-instrumentation-nodejs/node_modules/@opentelemetry/sdk-metrics/build/src/export/PeriodicExportingMetricReader.js:55:13)","message":"PeriodicExportingMetricReader: metrics export failed (error Error: 12 UNIMPLEMENTED: unknown service opentelemetry.proto.collector.metrics.v1.MetricsService)","name":"Error"}

The otc-collector sidecar only received 512 different spans of the instrumentation-fs

The only errors I can find during instrumentation init seem to be AWS related (after all it is not running in AWS). If you want I can upload the whole logs (>5000 lines).

f3l1x98 avatar Jan 16 '24 11:01 f3l1x98

instrumentation-fs is emitting a lot of traces (possibly when the node.js service starts up and reads node_modules dir) and is overall recommended to be turned off. I am not sure how that can be done through otel operator. See more here: https://github.com/open-telemetry/opentelemetry-operator/issues/1283#issuecomment-1909944887

mderazon avatar Jan 25 '24 11:01 mderazon

@mderazon can it be done by setting an env. variable?

pavolloffay avatar Jan 26 '24 09:01 pavolloffay

@mderazon I think the issue you're talking about is different from the original created by @f3l1x98. Can you create a new issue for better tracking?

iblancasa avatar Jan 26 '24 12:01 iblancasa

@iblancasa I wasn't sure I should create a new issue because this is written in the docs clearly:

At the moment, there is no way to opt-in to only specific packages or disable specific packages. If you don't want to use a package included by the default image you must either supply your own image that includes only the packages you want or use manual instrumentation.

However, it seems to be that the general recommendation from the node-instrumentation team is to disable instrumentation-fs.

I have also mentioned it in this issue: https://github.com/open-telemetry/opentelemetry-operator/issues/1283#issuecomment-1909944887. We can reopen that issue or I can create a new one, what would you prefer ?

mderazon avatar Jan 26 '24 13:01 mderazon

Update

I have now switched to using a custom image for the NodeJs instrumentation CRD. This custom image is based upon the official one, but with the autoinstrumentation.ts replaced with the code from my manual instrumentation:

const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node');
const {
  getNodeAutoInstrumentations,
} = require('@opentelemetry/auto-instrumentations-node');
const {
  OTLPTraceExporter,
} = require('@opentelemetry/exporter-trace-otlp-http');
const { Resource } = require('@opentelemetry/resources');
const {
  SemanticResourceAttributes,
} = require('@opentelemetry/semantic-conventions');
const { SimpleSpanProcessor } = require('@opentelemetry/sdk-trace-base');
const { registerInstrumentations } = require('@opentelemetry/instrumentation');

const exporter = new OTLPTraceExporter();

const provider = new NodeTracerProvider({
  resource: new Resource({
    [SemanticResourceAttributes.SERVICE_NAME]: process.env.OTEL_SERVICE_NAME,
  }),
});
// Export spans to opentelemetry collector
provider.addSpanProcessor(new SimpleSpanProcessor(exporter));

provider.register();
registerInstrumentations({
  instrumentations: [
    getNodeAutoInstrumentations({
      // disable fs
      '@opentelemetry/instrumentation-fs': {
        enabled: false,
      },
    }),
  ],
});

Using this custom image everything works, so it looks like the issue is with the official NodeJs autoinstrumentation image.

f3l1x98 avatar Feb 18 '24 12:02 f3l1x98