tinkerpop icon indicating copy to clipboard operation
tinkerpop copied to clipboard

TINKERPOP-2837 Fix NPE caused by a thread safe issue

Open ministat opened this issue 2 years ago • 13 comments

The current annotations object created by Collections.synchronizedMap() will be shared across multiple threads, and some methods which access this object do not apply lock (synchronized), as a result, NPE occurs. This patch fixed this issue.

ministat avatar Dec 16 '22 14:12 ministat

Codecov Report

Merging #1908 (ea83329) into 3.5-dev (341793e) will decrease coverage by 5.17%. The diff coverage is n/a.

@@              Coverage Diff              @@
##             3.5-dev    #1908      +/-   ##
=============================================
- Coverage      69.32%   64.16%   -5.17%     
=============================================
  Files            865       25     -840     
  Lines          41037     3750   -37287     
  Branches        5407        0    -5407     
=============================================
- Hits           28450     2406   -26044     
+ Misses         10667     1166    -9501     
+ Partials        1920      178    -1742     

see 852 files with indirect coverage changes

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov-commenter avatar Dec 16 '22 15:12 codecov-commenter

@ministat Change LGTM, however is it possible to create a test case that breaks before your change and works after?

lyndonbauto avatar Dec 19 '22 18:12 lyndonbauto

It is not easy to reproduce this thread safe issue in a UT, anyway let me try.

ministat avatar Dec 20 '22 10:12 ministat

The UT cannot reproduce this issue. Only the production env can see it. If the UT is a must, I suggest abondoning this PR.

ministat avatar Feb 04 '23 07:02 ministat

sorry - lost track of this one. i don't know that a unit test was a must. lyndon just asked for one. i'm curious about performance implications. anyway, i don' think you needed to close it if you were having a problem.

spmallette avatar Feb 04 '23 10:02 spmallette

@spmallette What performance implications do you mean here? Do you mean the performance comparison of Collections.synchronizedMap and ConcurrentHashMap? If yes, https://www.baeldung.com/java-synchronizedmap-vs-concurrenthashmap already gives the result.

ministat avatar Feb 05 '23 15:02 ministat

looking at this further reveals that annotations will lose their order. i assume providers inserting annotations would want to preserve the order they insert things as it may make a difference in presentation to the user. note that a test fails in 3.6-dev that isn't here on 3.5-dev as a result of this change. i think we'd want to find a fix that doesn't alter this behavior somehow.

spmallette avatar Feb 10 '23 14:02 spmallette

Hi @ministat, do you have any plans to come back to this one? Thanks!

xiazcy avatar Mar 24 '23 19:03 xiazcy

@xiazcy I cannot reproduce the issue through UT. This only occurs in my production environment. So, If a UT is a must, I do not want to continue invest on this.

ministat avatar Mar 28 '23 09:03 ministat

at this point i think the issue is less with unit tests and more with my comment here:

https://github.com/apache/tinkerpop/pull/1908#issuecomment-1425880289

spmallette avatar Mar 28 '23 10:03 spmallette

@spmallette What about applying concurrentlinkedhashmap? Maybe this one: https://github.com/ben-manes/concurrentlinkedhashmap which suggests using caffeine cache. Anyway, let us check whether the concurrentlinkedhashmap can pass the failed case in 3.6-dev.

ministat avatar Mar 29 '23 07:03 ministat

thanks for looking into this further. unfortunately, it's creating more questions. Not sure how others feel but I'm hesitant to include another dependency here to solve this, especially a deprecated one that is not maintained anymore. it's also changing gryo which would make server backward incompatible with drivers, e.g. 3.5.6 server would stop working with a 3.5.5 driver. gryo is deprecated i suppose but i'd say it remains a concern in my mind.

Given those issues, I'd like this approach to be a last resort and even then I'd consider relegating it to a newer version that doesn't break compatibility. Do you feel like we have reached a position of last resort at this point? If so, could you share other approaches you've tried that didn't really work well?

thanks for your patience on this - sometimes the smallest bugs can be a nuisance to fix and generate a lot of discussion.

spmallette avatar Mar 30 '23 11:03 spmallette

@spmallette Thanks for your attention on this. Well, for this issue, I did not find other approaches. I have temporarily applied ConcurrentHashMap in my local environment even though it sacrisfied the insertion order as you pointed. I agreed to archive this issue before we have a good solution.

ministat avatar Mar 31 '23 08:03 ministat

It looks like we haven't made progress on this issue for a while. I would suggest that we close it for now until someone has a suggestion that satisfies @spmallette concerns.

If there aren't any objections in the next 72 hours then I'll close this. Again, feel free to open it again once another solution has been proposed.

kenhuuu avatar Jun 03 '24 20:06 kenhuuu