opentelemetry-specification icon indicating copy to clipboard operation
opentelemetry-specification copied to clipboard

Support debug mode in the SDKs

Open carlosalberto opened this issue 3 years ago • 7 comments
trafficstars

We have discussed in the past offering a strict/debug mode in the SDKs, by which SDKs (and auto-instrumentation) would fail at startup time if wrong configuration is found, rather than logging warnings here and there (e.g. completely fail if environment variables have wrong values).

The intent here is to help diagnose incorrect configuration that may not be obvious at production time.

Related to https://github.com/open-telemetry/opentelemetry-specification/pull/2766

carlosalberto avatar Sep 21 '22 17:09 carlosalberto

How about adding a OTEL_FAIL_FAST env var to control this behaviour (default for SDK: false, supported values: true, false)?

Side note: I assume that auto-instrumentation could have a different default value than the SDK.

This is something I was thinking about some time ago. See: https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/issues/231#issuecomment-938013579.

pellared avatar Sep 22 '22 09:09 pellared

FYI: the javaagent has something called a debug mode, it mostly enabled additional (quite verbose) logging, adds logging exporters, and some context leak detection. Some of these could be set up at the SDK level actually, if it had a debug mode (assuming the SDK debug mode would do a couple more things than just config validation)

mateuszrzeszutek avatar Sep 22 '22 09:09 mateuszrzeszutek

I think that "debug mode" != "fail fast mode". Should we rename the issue to "fail fast mode"?

The "fail fast mode" may be also beneficial in production to get immediate feedback when something is wrong during application initialization.

pellared avatar Sep 22 '22 09:09 pellared

Does this need to be covered / enforced by the spec? I feel it is like an implementation choice (e.g. what's the best way to notify the developer varies a lot depending on the environment - Android, iOS, serverless cloud application, web apps which run in browsers). I'm concerned about the environment variable approach - going down this path we could end up with very complex environment variables which try to define the criteria (which component should fail fast, in which condition should it fail fast, etc.).

Different language SDKs already have existing mechanism and they might take different approaches. For example, the OpenTelemetry Go SDK offers an error callback, if the intention is to fail fast, the developers using the OTel Go SDK (including the developers who might be building an auto-instrumentation solution on top of the SDK) could implement the callback to fail fast, without requiring any changes to the existing SDK.

I can also see the debate in C++ where folks might say "if you release a DEBUG build, turn on the fast failing; if you have a RELEASE build, make it opt-in".

My suggestion is that we don't cover it in the spec, leave it to language implementations to find their best way.

reyang avatar Sep 23 '22 15:09 reyang

@reyang 👍

Side note. Right now the spec does not allow fail fast when parsing numeric and enum env vars...

pellared avatar Sep 23 '22 16:09 pellared

Side note. Right now the spec does not allow fail fast when parsing numeric and enum env vars...

Good point, maybe changing:

the SDK MUST generate a warning and gracefully ignore the setting

to

the SDK MUST generate a warning and gracefully ignore the setting by default

In this way it's not a breaking change in the spec?

reyang avatar Sep 24 '22 18:09 reyang

Right now the spec does not allow fail fast when parsing numeric and enum env vars...

@pellared why not? I think @reyang described (https://github.com/open-telemetry/opentelemetry-specification/issues/2817#issuecomment-1256371389) that it's possible with an error callback. The fail-fast behavior is implemented by the user in this case, so the SDK itself is compliant with the existing spec.

I would recommend promoting this callback mechanism to the spec as SHOULD.

yurishkuro avatar Sep 25 '22 01:09 yurishkuro

Does this need to be covered / enforced by the spec? Different language SDKs already have existing mechanism and they might take different approaches.

Agree

I don't think that using environment variables will work for everyone

MSNev avatar Dec 13 '22 16:12 MSNev

I don't think that using environment variables will work for everyone

This is not about forcing anyone to support it. I is about consistency if it is supported. Env vars are useful especially for auto-instrumentation,

pellared avatar Dec 29 '22 10:12 pellared