Issue with NestJs operator autoinstrumentation
Component(s)
operator, instrumentation
Describe the issue you're reporting
I hope this is the correct place for the question and somebody can help me.
I am currently trying to setup tracing for an application which consists of four NestJs applications. The traces are exported to a data-prepper instance which then sends them to OpenSearch. I have gotten the opentelemetry operator and autoinstrumentation to work in the sense that everything deploys, starts and traces are collected/send.
Issue
My issue now is that out of the four applications only two report traces from other instrumentations than the @opentelemetry/instrumentation-fs (like for example @opentelemetry/instrumentation-http).
The others only report fs traces (quite a lot of them), even though one of them is the main backend which calls the other applications.
I have also checked this with the logging exporter and got the same result so the issue should not be data-prepper/opensearch.
Used versions:
- OpenTelemetry Operator: 0.90.0 (and later tried 0.91.0)
Configurations
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: opentelemetry-collector-sidecar-agent
spec:
mode: sidecar
config: |
receivers:
otlp:
protocols:
grpc:
http:
exporters:
otlp/data-prepper:
endpoint: {{ .Values.dataprepper.host }}
tls:
insecure: true
insecure_skip_verify: true
service:
telemetry:
logs:
level: "debug"
pipelines:
traces:
receivers: [otlp]
exporters: [otlp/data-prepper]
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
name: {{ .Values.instrumentation.nodejs }}
spec:
propagators:
- tracecontext
- baggage
sampler:
type: parentbased_traceidratio
argument: "1"
nodejs:
apiVersion: v1
kind: Service
metadata:
name: {{ .Chart.Name }}-service
spec:
selector:
tier: {{ .Chart.Name }}
ports:
- protocol: TCP
port: 3001
name: http
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ .Chart.Name }}-deployment
spec:
replicas: {{ .Values.replicaCount }}
selector:
matchLabels:
tier: {{ .Chart.Name }}
template:
metadata:
labels:
app: backend
tier: {{ .Chart.Name }}
annotations:
sidecar.opentelemetry.io/inject: "{{ .Values.opentelemetry.injectSidecar }}"
instrumentation.opentelemetry.io/inject-nodejs: "{{ .Values.opentelemetry.autoInstrumentNodejs }}"
spec:
containers:
- name: {{ .Chart.Name }}
image: "{{ .Values.image.name }}:{{ .Values.image.tag }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
resources:
requests:
cpu: "{{ .Values.resources.requests.cpu }}"
memory: "{{ .Values.resources.requests.memory }}"
limits:
cpu: "{{ .Values.resources.limits.cpu }}"
memory: "{{ .Values.resources.limits.memory }}"
livenessProbe:
failureThreshold: 3
httpGet:
path: /health
port: 3001
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 60
successThreshold: 1
timeoutSeconds: 10
readinessProbe:
failureThreshold: 3
httpGet:
path: /health
port: 3001
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 60
successThreshold: 1
timeoutSeconds: 10
Update
I have now tried the same using the sdk manually and that works as expected (getting http, controller, sql, etc traces).
However I would really like to use the CRD if possible.
Can this be related to #2497?
Can this be related to #2497?
I don't think so, after all I do get some traces from the autoinstrumentation, just not nearly all. To me it looks like the instrumentation is not setup correctly and some parts are initialized after the application thus the instrumentation for (for example http) fails. I have tried the CRD again with the image that has the opentelemetry packages installed but got the same result.
Well I have enabled the debug logs of the Intsrumentation CRD and by the looks of it the instrumentation is mostly correctly initialized (at least http instrumentation succeeds), however the resulting spans are never sent to the otc-collector. part of backend logs.
@opentelemetry/api: Registered a global for diag v1.7.0.
@opentelemetry/api: Registered a global for trace v1.7.0.
@opentelemetry/api: Registered a global for context v1.7.0.
@opentelemetry/api: Registered a global for propagation v1.7.0.
@opentelemetry/api: Registered a global for metrics v1.7.0.
@opentelemetry/instrumentation-fs Applying patch for fs
@opentelemetry/instrumentation-nestjs-core Applying create patch for @nestjs/[email protected]
@opentelemetry/instrumentation-nestjs-core Applying create patch for @nestjs/[email protected]
maxQueueSize reached, dropping spans
@opentelemetry/instrumentation-nestjs-core Patching @nestjs/[email protected]
@opentelemetry/instrumentation-winston Applying patch for [email protected]
@opentelemetry/instrumentation-http Applying patch for [email protected]
@opentelemetry/instrumentation-http Applying patch for [email protected]
@opentelemetry/instrumentation-fs Applying patch for fs/promises
Applying patch for net module
Applying patch for [email protected]
Accessing resource attributes before async attributes settled
{"context":"NestFactory","level":"info","message":"Starting Nest application...","span_id":"3d190c80a1fe565d","timestamp":"2024-01-16T10:45:09.812Z","trace_flags":"01","trace_id":"5c948d80e0d3a83bcd3bb2e2d7e079f4"}
[...]
@opentelemetry/instrumentation-http http instrumentation incomingRequest
wrap lookup callback function and starts span
executing wrapped lookup callback function
executing original lookup callback function
@opentelemetry/instrumentation-http http instrumentation incomingRequest
wrap lookup callback function and starts span
executing wrapped lookup callback function
executing original lookup callback function
items to be sent [
{
resource: Resource {
_attributes: [Object],
asyncAttributesPending: false,
_syncAttributes: [Object],
_asyncAttributesPromise: [Promise]
},
scopeMetrics: [ [Object] ]
}
]
Service request {
resourceMetrics: [
{ resource: [Object], schemaUrl: undefined, scopeMetrics: [Array] }
]
}
{"stack":"Error: PeriodicExportingMetricReader: metrics export failed (error Error: 12 UNIMPLEMENTED: unknown service opentelemetry.proto.collector.metrics.v1.MetricsService)\n at doExport (/otel-auto-instrumentation-nodejs/node_modules/@opentelemetry/sdk-metrics/build/src/export/PeriodicExportingMetricReader.js:76:23)\n at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n at async PeriodicExportingMetricReader._doRun (/otel-auto-instrumentation-nodejs/node_modules/@opentelemetry/sdk-metrics/build/src/export/PeriodicExportingMetricReader.js:84:13)\n at async PeriodicExportingMetricReader._runOnce (/otel-auto-instrumentation-nodejs/node_modules/@opentelemetry/sdk-metrics/build/src/export/PeriodicExportingMetricReader.js:55:13)","message":"PeriodicExportingMetricReader: metrics export failed (error Error: 12 UNIMPLEMENTED: unknown service opentelemetry.proto.collector.metrics.v1.MetricsService)","name":"Error"}
The otc-collector sidecar only received 512 different spans of the instrumentation-fs
The only errors I can find during instrumentation init seem to be AWS related (after all it is not running in AWS). If you want I can upload the whole logs (>5000 lines).
instrumentation-fs is emitting a lot of traces (possibly when the node.js service starts up and reads node_modules dir) and is overall recommended to be turned off. I am not sure how that can be done through otel operator.
See more here:
https://github.com/open-telemetry/opentelemetry-operator/issues/1283#issuecomment-1909944887
@mderazon can it be done by setting an env. variable?
@mderazon I think the issue you're talking about is different from the original created by @f3l1x98. Can you create a new issue for better tracking?
@iblancasa I wasn't sure I should create a new issue because this is written in the docs clearly:
At the moment, there is no way to opt-in to only specific packages or disable specific packages. If you don't want to use a package included by the default image you must either supply your own image that includes only the packages you want or use manual instrumentation.
However, it seems to be that the general recommendation from the node-instrumentation team is to disable instrumentation-fs.
I have also mentioned it in this issue: https://github.com/open-telemetry/opentelemetry-operator/issues/1283#issuecomment-1909944887. We can reopen that issue or I can create a new one, what would you prefer ?
Update
I have now switched to using a custom image for the NodeJs instrumentation CRD.
This custom image is based upon the official one, but with the autoinstrumentation.ts replaced with the code from my manual instrumentation:
const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node');
const {
getNodeAutoInstrumentations,
} = require('@opentelemetry/auto-instrumentations-node');
const {
OTLPTraceExporter,
} = require('@opentelemetry/exporter-trace-otlp-http');
const { Resource } = require('@opentelemetry/resources');
const {
SemanticResourceAttributes,
} = require('@opentelemetry/semantic-conventions');
const { SimpleSpanProcessor } = require('@opentelemetry/sdk-trace-base');
const { registerInstrumentations } = require('@opentelemetry/instrumentation');
const exporter = new OTLPTraceExporter();
const provider = new NodeTracerProvider({
resource: new Resource({
[SemanticResourceAttributes.SERVICE_NAME]: process.env.OTEL_SERVICE_NAME,
}),
});
// Export spans to opentelemetry collector
provider.addSpanProcessor(new SimpleSpanProcessor(exporter));
provider.register();
registerInstrumentations({
instrumentations: [
getNodeAutoInstrumentations({
// disable fs
'@opentelemetry/instrumentation-fs': {
enabled: false,
},
}),
],
});
Using this custom image everything works, so it looks like the issue is with the official NodeJs autoinstrumentation image.