opentelemetry-java icon indicating copy to clipboard operation
opentelemetry-java copied to clipboard

Fix race condition of `GlobalOpenTelemetry` initialization with `AutoConfiguredOpenTelemetrySdkBuilder`

Open fandreuz opened this issue 7 months ago • 6 comments

Fixes #7354.

~~Not sure if reflection is the best way to achieve this, I've seen it's quite used in the SDK to hide things that should not be directly exposed to the users. Let me know if a proper API exposed by GlobalOpenTelemetry would be more appropriate.~~

fandreuz avatar May 23 '25 21:05 fandreuz

Codecov Report

Attention: Patch coverage is 94.73684% with 1 line in your changes missing coverage. Please review.

Project coverage is 89.79%. Comparing base (ada5af6) to head (9d00edd). Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
...nfigure/AutoConfiguredOpenTelemetrySdkBuilder.java 92.85% 0 Missing and 1 partial :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #7365      +/-   ##
============================================
+ Coverage     89.75%   89.79%   +0.04%     
- Complexity     6980     6985       +5     
============================================
  Files           797      797              
  Lines         21165    21174       +9     
  Branches       2057     2057              
============================================
+ Hits          18996    19013      +17     
+ Misses         1505     1501       -4     
+ Partials        664      660       -4     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar May 23 '25 21:05 codecov[bot]

Won't this lead to some difficult situations where someone has a long-lived handle to an instance that they got from GlobalOpenTelemetry.get() which is no longer the "active" instance? Have we thoroughly thought through the implications of that? Will this behavior be surprising when we support dynamically changing the behavior of GlobalOpenTelemetry via (eg.) opamp?

jkwatson avatar Jun 10 '25 03:06 jkwatson

Won't this lead to some difficult situations where someone has a long-lived handle to an instance that they got from GlobalOpenTelemetry.get() which is no longer the "active" instance? Have we thoroughly thought through the implications of that? Will this behavior be surprising when we support dynamically changing the behavior of GlobalOpenTelemetry via (eg.) opamp?

never mind. I missed that set still doesn't let you change the global instance. No problem then! Looks good!

jkwatson avatar Jun 10 '25 03:06 jkwatson

Can we write a unit test that verifies that this fixes the race condition?

jkwatson avatar Jun 10 '25 03:06 jkwatson

Can we write a unit test that verifies that this fixes the race condition?

A fully deterministic reproducer would be relatively hard to write for this issue since it's a race condition. I could do that, but I'd probably have to resort to artificial sleeps to reproduce the conditions for the race. Would such a test be acceptable @jkwatson ?

fandreuz avatar Jun 10 '25 22:06 fandreuz

Can we write a unit test that verifies that this fixes the race condition?

A fully deterministic reproducer would be relatively hard to write for this issue since it's a race condition. I could do that, but I'd probably have to resort to artificial sleeps to reproduce the conditions for the race. Would such a test be acceptable @jkwatson ?

Yes, better than no test at all, so we don't accidentally re-introduce the issue in the future.

jkwatson avatar Jun 10 '25 23:06 jkwatson

Hi @jkwatson, please check the new test in AutoConfiguredOpenTelemetrySdkTest

fandreuz avatar Jun 17 '25 23:06 fandreuz

Looks good. @jkwatson there's a unit test now for this. Any additional resevations?

🚢

jkwatson avatar Jun 24 '25 18:06 jkwatson