opentelemetry-js Instrumentation "enabled" configuration inconsistent with OpenTelemetry

The "@opentelemetry/instrumentation" defines a generic InstrumentationConfig interface:

export interface InstrumentationConfig {
  /**
   * Whether to enable the plugin.
   * @default true
   */
  enabled?: boolean;
}

This interface defines an optional configuration for all instrumentations, named "enabled", with the default value "true". The default value is enforced In InstrumentationAbstract:

    this._config = {
      enabled: true,
      ...config,
    };

Which introduces the following issues:

Inconsistent With OpenTelemetry Specification

The OpenTelemetry specification for environment variables says:

All Boolean environment variables SHOULD be named and defined such that false is the expected safe default behavior

The concept is applied in the specification with the OTEL_SDK_DISABLED env variable.

Although this configuration is not an environment variable in the exact context, having the default value as true creates inconsistencies in the OpenTelemetry JS implementation which can be a source of confusions for end users and code maintainers, and increases cognitive load for consumers which should be aware of default values per use-case.

For node-sdk implementation, the property is used with compliance to spec:

export class NodeSDK {
  ...
  private _disabled?: boolean;
  ...
}

Harder to Maintain

When a user chooses not to set the enabled flag (which is very common) leaving it as undefined, the implementation code needs to be aware of the default and make sure to use nullish-coalescing or some other fallback mechnisim everywhere it is accessed.

This practice requires awareness which is not straightforward to comply with, as the enabled flag is a public API property and can be consumed in OpenTelemetry package but also by any user of the @opentelemetry/instrumentation package.

Buggy

The current implementation of InstrumentationAbstract assumes that the config object is sanitized in the constructor:

  constructor(
    public readonly instrumentationName: string,
    public readonly instrumentationVersion: string,
    config: ConfigType = {}
  ) {
    this._config = {
      enabled: true,
      ...config,
    };

And then read the value freely without checking for undefined and defaulting the true. For example, the value is read as-is here.

Since the logic to change enabled undefined to true is not applied in the InstrumentationAbstract class with setConfig function:

  /**
   * Sets InstrumentationConfig to this plugin
   * @param InstrumentationConfig
   */
  public setConfig(config: InstrumentationConfig = {}): void {
    this._config = Object.assign({}, config);
  }

any call to setConfig will invalidate the assumption which is a bug. Even if we fix it for the base class, It is still extremely easy for instrumentation authors that override setConfig to miss this nuance and introduce bugs to the implementation.

Fix

I can think of 2 ways to fix this issue:

align with opentelemetry specification and change the enabled property to disabled, and change the default value to false. This will be a potential breaking change for the package which we can introduce at the moment but not after the package is stable.
Keep the current property as enabled but fix the places where it is used, as well as add documentation for consumers on how to use it safely.

I personally think that since the package is still experimental and breaking changes are allowed, it is a good opportunity to introduce changes that will promote consistent and robust use of the package.

Would love to hear more opinions and to work on changing this property if there isn't any push-back.

Apr 21 '24 09:04 blumamir

By the way, the handling of enabled flag is not done consistent as of now, see https://github.com/open-telemetry/opentelemetry-js/issues/2257

I think a single enable / disable flag is not enough in general. I would expect 2 flags:

enable (or disable) - only evaluated at construction time to decide patch yes/no
capture - runtime changeable, evaluated whenever some monitored/pached API is called

Apr 21 '24 16:04 Flarna

By the way, the handling of enabled flag is not done consistent as of now, see #2257

I think a single enable / disable flag is not enough in general. I would expect 2 flags:

enable (or disable) - only evaluated at construction time to decide patch yes/no

capture - runtime changeable, evaluated whenever some monitored/pached API is called

So the patch will be decided at construction time and then capture will decide if an invoked patch should record telemetry or not?

In this case, when will instrumentation unpatch is expected to be called?

Apr 24 '24 06:04 blumamir

I would never call unpatch in a production environment. There is no guarantee that it really does what one would expect. There were quite some discussions in the regard in the past, see #3301 and linked issues for details but got stale without decission. This didn't improve since that time, with ECMA script modules, loader threads and source code modification on load (see import in the middle for details).

I think unpatch is fine in tests. I would expect it tries to undo what patch did. In tests you usually do not have other stuff potentially patching the the same APIs so there it should work fine.

It might be reasonable to expect that setting enabled to false at runtime also sets capture to false. But that's more a topic for documentation.

Apr 24 '24 07:04 Flarna

I can think of 2 ways to fix this issue:

align with opentelemetry specification and change the enabled property to disabled, and change the default value to false. This will be a potential breaking change for the package which we can introduce at the moment but not after the package is stable.

Keep the current property as enabled but fix the places where it is used, as well as add documentation for consumers on how to use it safely.

🤔 If we change this to disabled, that would potentially be very disruptive to people using enabled: false - the first thing that comes to mind is the fs-instrumentation that produces many events and is not always needed for folks, so they add the auto-instrumentations-node metapackage, and explicitly set fs instrumentation to false. With this change, if they don't update their code but they upgrade their instrumentation package, they will need to change enabled: false to disabled: true to avoid getting an unexpected influx of events. I am concerned about how big that impact may be for folks who miss the changelog update, and it may be too much without declaring a major version. Experimental technically means we could do this... but I am still hesitant based on the potential impact to end users.

Because of the potentially large impact on end users, I'd be more in favor of Option 2 to keep things as consistent as possible. As we look to stabilize, we may consider reversing the enabled/disabled flag and evangelizing it (similar to how we will need to do it with the http attributes).

Jun 03 '24 19:06 JamieDanielson

Is there a possible third option of implementing number 1, with a deprecated codepath that checks for enabled:false and sets disabled:true if that is found?

Jun 03 '24 19:06 JamieDanielson

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.

Aug 05 '24 06:08 github-actions[bot]

This issue was closed because it has been stale for 14 days with no activity.

Sep 16 '24 06:09 github-actions[bot]

opentelemetry-js opentelemetry-js copied to clipboard

Instrumentation "enabled" configuration inconsistent with OpenTelemetry

Inconsistent With OpenTelemetry Specification

Harder to Maintain

Buggy

Fix

opentelemetry-js
opentelemetry-js copied to clipboard