opentelemetry-java icon indicating copy to clipboard operation
opentelemetry-java copied to clipboard

Fix ConcurrentModificationException #6732

Open neugartf opened this issue 1 year ago • 3 comments

Could be nicer with jdk > 8, but that's how it is :). I added a filter to make sure only string key/values from Properties are read into our logic.

neugartf avatar Sep 25 '24 01:09 neugartf

CLA Signed

The committers listed above are authorized under a signed CLA.

  • :white_check_mark: login: neugartf / name: Fabian Neugart (ab8447079fe9efda828cdef2fe810d4c0027ab7e, b115c5a6e2f469f437026fad2319ae4588b60cd0, af488636cc61f791704f39a47439fb6c4262cb24, 329c05d88da9c76a97443cd51afccf71a432957c, d7364062928d38afccbb3613484cf1789cf37594)

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 90.06%. Comparing base (697b4e0) to head (d736406). Report is 61 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #6746      +/-   ##
============================================
- Coverage     90.08%   90.06%   -0.02%     
+ Complexity     6464     6462       -2     
============================================
  Files           718      718              
  Lines         19528    19530       +2     
  Branches       1923     1923              
============================================
- Hits          17591    17590       -1     
- Misses         1348     1350       +2     
- Partials        589      590       +1     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Sep 25 '24 01:09 codecov[bot]

I think we can add a hook method in stream pipeline to support unit test

hook method is like this:

  private <T> Function< T,  T> hook() {
    return new Function<T, T>() {
      @Override
      public T apply(T t) {
        System.setProperty("ConfigUtilTestKey" , "ConfigUtilTestKey");
        System.clearProperty("ConfigUtilTestKey" );
        return t;
      }
    };
  }

origin code can be like this:

    String systemProperty = Collections.unmodifiableSet(System.getProperties().stringPropertyNames()).stream()
        .map(hook())
        .filter(propertyName -> normalizedKey.equals(normalizePropertyKey(propertyName)))
        .map(System::getProperty)
        .findFirst()
        .orElse(null);

unit test may be like this:

  @Test
  void testMyFix() {
    assertThat(catchThrowable(() ->ConfigUtil.getString("myfix","default"))).isNull();
  }

shalk avatar Sep 30 '24 10:09 shalk

Anything missing here, @jack-berg ? Would be happy to see this in a release soon :).

neugartf avatar Oct 21 '24 13:10 neugartf

Sorry for not getting back on this @neugartf. We talked about this in the 10/10/2024 java SIG, just before the 1.43.0 release. Decided it needed more work:

  • Its not the only place we access system properties and iterate over them. DefaultConfigProperties also uses a similar pattern, and is likely more prone to the problem you report.
  • We don't like the overhead of calling System.getProperties().clone(), and especially in DefaultConfigProperties where we couldn't call it once and cache the result like I proposed.
  • And so we'd prefer to go with one of the other options, but since we can't reproduce the issue, we don't have confidence that the alternatives aren't subject to the same problem.
  • What we really need is a reproduction. Based on the issue and my attempts to reproduce in a variety of java environments, I think this is limited to android. We don't have android tooling setup in this repository, but maybe you we can demonstrate in the opentelemetry-android project.

So in summary, to get this back on track, we need:

  1. A reliable reproduction.
  2. A fix we can show resolves the reproduction, preferably without cloning system properties.
  3. A fix applied to both ConfigUtil and DefaultConfigProperties usage of system properties.

jack-berg avatar Oct 21 '24 15:10 jack-berg

Thanks for getting back (and with that detail)!

I'll invest some time to get this reproduced on the test app there (opentelemetry-android), though it won't be a proper junit test since we'll need to boot up the whole system + app. I'll keep you posted!

neugartf avatar Oct 21 '24 22:10 neugartf

I'll invest some time to get this reproduced on the test app there (opentelemetry-android), though it won't be a proper junit test since we'll need to boot up the whole system + app.

That would be great. If you can just show that you can read and write to System.getProperties() concurrently to recreate this, it would suffice. No need to jump through extra hoops to create the issue specifically with ConfigUtil.

jack-berg avatar Oct 21 '24 22:10 jack-berg

Hey @jack-berg, was able to reproduce the crash with this change. The fix with clone runs at ~10k ns. @laurit 's fix with synchronizing also works fine :1st_place_medal: . Let me know which one we should go forward with. :bow:

neugartf avatar Oct 28 '24 00:10 neugartf

Thanks for doing that research @neugartf. I have a slight preference to go with @laurit's synchronized approach, because I have a general aversion to implementing Cloneable or calling clone().

What do you think about adding a method for accessing a safe copy of System.getProperties() to ConfigUtil, and updating DefaultConfigProperties#create and ConfigUtil#getString() to use this method?

jack-berg avatar Oct 28 '24 21:10 jack-berg

Hey @neugartf - I picked this up in #6841. Take a look and let me know what you think.

jack-berg avatar Oct 30 '24 22:10 jack-berg

Merged #6841 so closing this.

jack-berg avatar Nov 01 '24 19:11 jack-berg

Didn't had much time recently, but looks good! Thanks for taking care of that :).

neugartf avatar Nov 03 '24 00:11 neugartf