opentelemetry-operator
opentelemetry-operator copied to clipboard
Node.js resource detectors can't be disabled with `OTEL_NODE_RESOURCE_DETECTORS`
Component(s)
instrumentation
What happened?
Description
The Node.js auto-instrumentation includes a series of cloud provider resource detectors which cannot be toggled off. I'm running on AKS, so the instrumentation keeps attempting to retrieve metadata related to other cloud providers (for example, calling the API server on https://kubernetes.default.svc/api/v1/namespaces/kube-system/configmaps/aws-auth), but this naturally fails because the metadata doesn't exist.
The auto-instrumentations-node package exposes OTEL_NODE_RESOURCE_DETECTORS to give users control over which resource detectors to use, but here in the Operator, since all resource detectors are hardcoded when instantiating the SDK, this variable has no effect.
https://github.com/open-telemetry/opentelemetry-operator/blob/c10fe8aff3017d41f82999e354e6a705a3b2dfe7/autoinstrumentation/nodejs/src/autoinstrumentation.ts#L49-L55
Steps to Reproduce
- Run the operator on AKS
- Auto-instrument a Node.js application using
instrumentation.opentelemetry.io/inject-nodejs: "my-instrument" - Enable debug mode on the application with
OTEL_LOG_LEVEL="debug" - Set
OTEL_NODE_RESOURCE_DETECTORS="env,host,os,process,container"on the application to try and exclude cloud-specific resource detectors - Resource detectors are still enabled
Expected Result
OTEL_NODE_RESOURCE_DETECTORS should work as per https://github.com/open-telemetry/opentelemetry-js-contrib/blob/main/metapackages/auto-instrumentations-node/README.md#usage-auto-instrumentation
Actual Result
OTEL_NODE_RESOURCE_DETECTORS has no effect
Kubernetes Version
1.26.6
Operator version
0.92.1
Collector version
0.92.0
Environment information
Environment
AKS
Log output
a resource's async attributes promise rejected: Error: ECS metadata api request timed out.
at Timeout._onTimeout (/otel-auto-instrumentation-nodejs/node_modules/@opentelemetry/resource-detector-alibaba-cloud/build/src/detectors/AlibabaCloudEcsDetector.js:87:29)
at listOnTimeout (internal/timers.js:555:17)
at processTimers (internal/timers.js:498:7)
AlibabaCloudEcsDetector found resource. Resource {
_attributes: {},
asyncAttributesPending: false,
_syncAttributes: {},
_asyncAttributesPromise: Promise {
{},
[Symbol(async_id_symbol)]: 75029,
[Symbol(trigger_async_id_symbol)]: 0
}
}
a resource's async attributes promise rejected: Error: EC2 metadata api request timed out.
at Timeout._onTimeout (/otel-auto-instrumentation-nodejs/node_modules/@opentelemetry/resource-detector-aws/build/src/detectors/AwsEc2Detector.js:113:24)
at listOnTimeout (internal/timers.js:555:17)
at processTimers (internal/timers.js:498:7)
AwsEc2Detector found resource. Resource {
_attributes: {},
asyncAttributesPending: false,
_syncAttributes: {},
_asyncAttributesPromise: Promise {
{},
[Symbol(async_id_symbol)]: 75034,
[Symbol(trigger_async_id_symbol)]: 0
}
}
error reading machine id: Error: ENOENT: no such file or directory, open '/etc/machine-id'
GcpDetector failed: GCP Metadata unavailable.
GcpDetector found resource. Resource {
_attributes: {},
asyncAttributesPending: false,
_syncAttributes: {},
_asyncAttributesPromise: Promise {
{},
[Symbol(async_id_symbol)]: 75063,
[Symbol(trigger_async_id_symbol)]: 0
}
}
error reading machine id: Error: ENOENT: no such file or directory, open '/var/lib/dbus/machine-id'
@opentelemetry/instrumentation-http outgoingRequest on response()
@opentelemetry/instrumentation-http outgoingRequest on end()
Process is not running on K8S Error: Failed to load page, status code: 403
at IncomingMessage.<anonymous> (/otel-auto-instrumentation-nodejs/node_modules/@opentelemetry/resource-detector-aws/build/src/detectors/AwsEksDetector.js:192:32)
at /otel-auto-instrumentation-nodejs/node_modules/@opentelemetry/context-async-hooks/build/src/AbstractAsyncHooksContextManager.js:50:55
at AsyncLocalStorage.run (async_hooks.js:305:14)
at AsyncLocalStorageContextManager.with (/otel-auto-instrumentation-nodejs/node_modules/@opentelemetry/context-async-hooks/build/src/AsyncLocalStorageContextManager.js:33:40)
at IncomingMessage.contextWrapper (/otel-auto-instrumentation-nodejs/node_modules/@opentelemetry/context-async-hooks/build/src/AbstractAsyncHooksContextManager.js:50:32)
at IncomingMessage.emit (events.js:388:22)
at endReadableNT (internal/streams/readable.js:1336:12)
at processTicksAndRejections (internal/process/task_queues.js:82:21)
AwsEksDetector found resource. Resource {
_attributes: {},
asyncAttributesPending: false,
_syncAttributes: {},
_asyncAttributesPromise: Promise {
{},
[Symbol(async_id_symbol)]: 75589,
[Symbol(trigger_async_id_symbol)]: 0
}
}
Additional context
Originally posted in https://github.com/open-telemetry/opentelemetry-js-contrib/issues/1780
You should add that to the otelins resource like this as the one on the pod/container will be overwritten:
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
name: my-instrumentation
spec:
exporter:
endpoint: http://otel-collector:4317
...
nodejs:
env:
- name: OTEL_NODE_RESOURCE_DETECTORS
value: env,host,os,process,container
I've tried that too, still doesn't take effect. See https://github.com/open-telemetry/opentelemetry-js-contrib/issues/1780#issuecomment-1943723681.
I did this in my cluster and it worked. Are you using the latest version of the operator?
Stumbled upon this one today and can confirm that this is a real issue.
Modifying OTEL_NODE_RESOURCE_DETECTORS may lead to errors like this (code):
Invalid resource detector "container" specified in the environment variable OTEL_NODE_RESOURCE_DETECTORS
Invalid resource detector "aws" specified in the environment variable OTEL_NODE_RESOURCE_DETECTORS
Since SDK will check OTEL_NODE_RESOURCE_DETECTORS but not the operator as mentioned by the reporter.