opentelemetry-java icon indicating copy to clipboard operation
opentelemetry-java copied to clipboard

GlobalOpenTelemetry.get and set Conflict Causing Runtime Exception in OpenTelemetry SDK

Open celikrecep opened this issue 1 year ago • 3 comments

Hello, I am using OpenTelemetry version 1.44.0 in my Android project. On Crashlytics, I see the following error: "GlobalOpenTelemetry.set". This error doesn’t happen all the time but occurs for some users. The root cause of the error is as follows:

io.opentelemetry.api.GlobalOpenTelemetry.set (GlobalOpenTelemetry.java:115)
io.opentelemetry.api.GlobalOpenTelemetry.get (GlobalOpenTelemetry.java:85)
io.opentelemetry.api.GlobalOpenTelemetry.getMeterProvider (GlobalOpenTelemetry.java:174)
io.opentelemetry.exporter.internal.ExporterMetrics.meter (ExporterMetrics.java:91)
io.opentelemetry.exporter.internal.ExporterMetrics.seen (ExporterMetrics.java:74)
io.opentelemetry.exporter.internal.ExporterMetrics.addSeen (ExporterMetrics.java:58)
io.opentelemetry.exporter.internal.http.HttpExporter.export (HttpExporter.java:58)
io.opentelemetry.exporter.otlp.http.logs.OtlpHttpLogRecordExporter.export (OtlpHttpLogRecordExporter.java:102)
io.opentelemetry.sdk.logs.export.BatchLogRecordProcessor$Worker.exportCurrentBatch (BatchLogRecordProcessor.java:321)
io.opentelemetry.sdk.logs.export.BatchLogRecordProcessor$Worker.run (BatchLogRecordProcessor.java:238)
java.lang.Thread.run (Thread.java:929)

Prior to this, the stack trace points to the Dagger module where I provide OpenTelemetry, as follows:

io.opentelemetry.sdk.OpenTelemetrySdkBuilder.buildAndRegisterGlobal (OpenTelemetrySdkBuilder.java:85)

I provide OpenTelemetry using Dagger Hilt with the following code:

OpenTelemetrySdk.builder()
    .apply { contextPropagators.ifPresent { setPropagators(it) } }
    .apply { tracerProvider.ifPresent { setTracerProvider(it) } }
    .apply { sdkLoggerProvider.ifPresent { setLoggerProvider(it) } }
    .apply { sdkMeterProvider.ifPresent { setMeterProvider(it) } }
    .buildAndRegisterGlobal()

Afterward, I perform all operations through this OpenTelemetry instance without calling the set method again.

When I follow the stack trace, the issue seems to occur because the get() method calls set() since GlobalOpenTelemetry is null. However, within the set() method, an exception is thrown because GlobalOpenTelemetry is not null.

Despite trying various approaches, I have been unable to reproduce this error. Could you please help me with this issue?

Artifacts: (opentelemetry-bom, opentelemetry-sdk, opentelemetry-api, opentelemetry-exporter-otlp, opentelemetry-exporter-logging, opentelemetry-exporter-otlp-common)

Version: (1.44.0)

How did you reference these artifacts? build.gradle

Note: I’m not sure if it matters, but when using version 1.40.0, this error occurred very rarely—maybe once a week at most for users.

Please also share if you have any suggestion regarding how to initialize OpenTelemetry before any of the GlobalOpenTelemetry APIs get invoked automatically by the SDK?

celikrecep avatar Dec 10 '24 14:12 celikrecep

@jkwatson I think there's a possible race condition.

  • OtlpHttpLogRecordExporter (and other exports) capture internal metrics with a lazily initialized meter from GlobalOpenTelemetry. See here.
  • BatchLogRecordProcessor's worker thread starts immediately when built, pulling logs off the queue and calling exporting them via OtlpHttpLogRecordExporter, and causing OtlpHttpLogRecordExporter's lazily initialized meter to initialize.
  • This means as soon as SdkLoggerProvider is plugged into instrumentation and has log records flowing, GlobalOpenTelemetry.get() is going to be called. And if you recall, GlobalOpenTelemetry.get() is guaranteed to always return the same result, so calling set after get produces an error.
  • So the race is:
    • A user (or autoconfiguration process) initializes a SdkLoggerProvider with BatchLoggerProvider and OtlpHttpLogRecordExporter
    • They plug the SdkLoggerProvider into instrumentation, which records logs
    • Moments later, they call GlobalOpenTelemetry.set with the SdkLoggerProvider they've just constructed
    • There is a race between when the BatchLogRecordProcessor's worker exports the first batch, causing OtlpHttpLogRecordExporter's meter to be lazily initialized, calling GlobalOpenTelemetry.get(), and when GlobalOpenTelemetry.set is called by the setup process

@breedx-splk do you know of any instances in the android setup process where the autoconfigured SdkLoggerProvider might be used to emit logs (or events) before GlobalOpenTelemetry.set is called?

jack-berg avatar Dec 10 '24 15:12 jack-berg

@breedx-splk do you know of any instances in the android setup process where the autoconfigured SdkLoggerProvider might be used to emit logs (or events) before GlobalOpenTelemetry.set is called?

I don't know of an instance like that right now, but it sounds like @celikrecep might not be using opentelemetry-android at all and is just leveraging the api+sdk directly? Can we confirm?

We do have a scenario where we need to emit events during library initialization, before the otel sdk has been created...and in those cases we have to go through some pains to buffer/store those in a separate model before we can actually emit them with otel.

breedx-splk avatar Dec 10 '24 22:12 breedx-splk

yes, i'm just using api+sdk, i'm not not using opentelemetry-android @breedx-splk. If I use build() instead of buildRegisterAndGlobal(), GlobalOpenTelemetry will remain as no-op. Would this break or affect metrics and baggagePropagator? Or do you have any suggestions to avoid a crash?

celikrecep avatar Dec 11 '24 06:12 celikrecep