BlackLab icon indicating copy to clipboard operation
BlackLab copied to clipboard

Race condition in HitsGroupsTokenFrequencies

Open KCMertens opened this issue 2 years ago • 3 comments

Intellij (rightfully) warns about non-atomic operation on volatile variable https://github.com/INL/BlackLab/blob/512719637ab986533ac9c9aaf7a575e7d3e0d586/engine/src/main/java/nl/inl/blacklab/search/results/HitGroupsTokenFrequencies.java#L299

Fix this by replacing the int and long with AtomicInteger and AtomicLong

It won't crash, but group sizes may too low because addition from one thread may be overwritten by subsequent addition in another thread.

KCMertens avatar Dec 01 '22 10:12 KCMertens

Doesn't the fact that globalOccurrences is a ConcurrentHashMap mean this is actually safe? Not 100% sure, so using Atomic* is probably a good idea. Hopefully this doesn't slow things down too much.

jan-niestadt avatar Dec 01 '22 11:12 jan-niestadt

Hmm, documentation of ConcurrentHashMap states "retrieval operations do not entail locking", so no, I don't think it prevents it sadly

KCMertens avatar Dec 01 '22 11:12 KCMertens

But the documentation for compute() states "The entire method invocation is performed atomically.", and no other operations can occur at this time, right? (docIds.parallellStream() only calls compute() on globalOccurrences).

Again, seeing how hard this is to figure out for sure, going with Atomic* is probably the safest, or at the very least documenting why we think the code is safe as-is (i.e. the bottom line of this discussion, whatever that may turn out to be :-).

jan-niestadt avatar Dec 01 '22 11:12 jan-niestadt