opentelemetry-js
opentelemetry-js copied to clipboard
Instrumentation "enabled" configuration inconsistent with OpenTelemetry
The "@opentelemetry/instrumentation" defines a generic InstrumentationConfig interface:
export interface InstrumentationConfig {
/**
* Whether to enable the plugin.
* @default true
*/
enabled?: boolean;
}
This interface defines an optional configuration for all instrumentations, named "enabled", with the default value "true". The default value is enforced In InstrumentationAbstract:
this._config = {
enabled: true,
...config,
};
Which introduces the following issues:
Inconsistent With OpenTelemetry Specification
The OpenTelemetry specification for environment variables says:
All Boolean environment variables SHOULD be named and defined such that false is the expected safe default behavior
The concept is applied in the specification with the OTEL_SDK_DISABLED
env variable.
Although this configuration is not an environment variable in the exact context, having the default value as true
creates inconsistencies in the OpenTelemetry JS implementation which can be a source of confusions for end users and code maintainers, and increases cognitive load for consumers which should be aware of default values per use-case.
For node-sdk implementation, the property is used with compliance to spec:
export class NodeSDK {
...
private _disabled?: boolean;
...
}
Harder to Maintain
When a user chooses not to set the enabled
flag (which is very common) leaving it as undefined
, the implementation code needs to be aware of the default and make sure to use nullish-coalescing or some other fallback mechnisim everywhere it is accessed.
This practice requires awareness which is not straightforward to comply with, as the enabled
flag is a public API property and can be consumed in OpenTelemetry package but also by any user of the @opentelemetry/instrumentation
package.
Buggy
The current implementation of InstrumentationAbstract
assumes that the config object is sanitized in the constructor:
constructor(
public readonly instrumentationName: string,
public readonly instrumentationVersion: string,
config: ConfigType = {}
) {
this._config = {
enabled: true,
...config,
};
And then read the value freely without checking for undefined
and defaulting the true
. For example, the value is read as-is here.
Since the logic to change enabled undefined
to true
is not applied in the InstrumentationAbstract
class with setConfig
function:
/**
* Sets InstrumentationConfig to this plugin
* @param InstrumentationConfig
*/
public setConfig(config: InstrumentationConfig = {}): void {
this._config = Object.assign({}, config);
}
any call to setConfig
will invalidate the assumption which is a bug. Even if we fix it for the base class, It is still extremely easy for instrumentation authors that override setConfig
to miss this nuance and introduce bugs to the implementation.
Fix
I can think of 2 ways to fix this issue:
- align with opentelemetry specification and change the
enabled
property todisabled
, and change the default value tofalse
. This will be a potential breaking change for the package which we can introduce at the moment but not after the package is stable. - Keep the current property as
enabled
but fix the places where it is used, as well as add documentation for consumers on how to use it safely.
I personally think that since the package is still experimental and breaking changes are allowed, it is a good opportunity to introduce changes that will promote consistent and robust use of the package.
Would love to hear more opinions and to work on changing this property if there isn't any push-back.
By the way, the handling of enabled
flag is not done consistent as of now, see https://github.com/open-telemetry/opentelemetry-js/issues/2257
I think a single enable
/ disable
flag is not enough in general. I would expect 2 flags:
-
enable
(ordisable
) - only evaluated at construction time to decide patch yes/no -
capture
- runtime changeable, evaluated whenever some monitored/pached API is called
By the way, the handling of
enabled
flag is not done consistent as of now, see #2257I think a single
enable
/disable
flag is not enough in general. I would expect 2 flags:
enable
(ordisable
) - only evaluated at construction time to decide patch yes/nocapture
- runtime changeable, evaluated whenever some monitored/pached API is called
So the patch
will be decided at construction time and then capture
will decide if an invoked patch should record telemetry or not?
In this case, when will instrumentation unpatch
is expected to be called?
I would never call unpatch
in a production environment. There is no guarantee that it really does what one would expect.
There were quite some discussions in the regard in the past, see #3301 and linked issues for details but got stale without decission.
This didn't improve since that time, with ECMA script modules, loader threads and source code modification on load (see import in the middle for details).
I think unpatch
is fine in tests. I would expect it tries to undo what patch did. In tests you usually do not have other stuff potentially patching the the same APIs so there it should work fine.
It might be reasonable to expect that setting enabled
to false
at runtime also sets capture
to false
. But that's more a topic for documentation.
I can think of 2 ways to fix this issue:
- align with opentelemetry specification and change the
enabled
property todisabled
, and change the default value tofalse
. This will be a potential breaking change for the package which we can introduce at the moment but not after the package is stable.- Keep the current property as
enabled
but fix the places where it is used, as well as add documentation for consumers on how to use it safely.
🤔 If we change this to disabled
, that would potentially be very disruptive to people using enabled: false
- the first thing that comes to mind is the fs-instrumentation that produces many events and is not always needed for folks, so they add the auto-instrumentations-node metapackage, and explicitly set fs instrumentation to false. With this change, if they don't update their code but they upgrade their instrumentation package, they will need to change enabled: false
to disabled: true
to avoid getting an unexpected influx of events. I am concerned about how big that impact may be for folks who miss the changelog update, and it may be too much without declaring a major version. Experimental technically means we could do this... but I am still hesitant based on the potential impact to end users.
Because of the potentially large impact on end users, I'd be more in favor of Option 2 to keep things as consistent as possible. As we look to stabilize, we may consider reversing the enabled/disabled flag and evangelizing it (similar to how we will need to do it with the http attributes).
Is there a possible third option of implementing number 1, with a deprecated codepath that checks for enabled:false
and sets disabled:true
if that is found?
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.
This issue was closed because it has been stale for 14 days with no activity.