Concurrency problem in `NamespacedHierarchicalStore#computeIfAbsent`
Steps to reproduce
- Clone https://github.com/assertj/assertj locally
- Setup project according to
CONTRIBUTING.md - In line: https://github.com/assertj/assertj/commit/f90a09a25ba23ee077dbd1719acd643c7fea4da0#diff-5d92da3c93616cd468991f168636c72526bc929869ee121efa0a1478600c950cR45 remove the
@Disabledannotation - Execute
./mvnw clean verify
Expected
Tests run successfully, all tests pass.
Actual
When running all tests, the test in SoftAssertionsExtension_PER_CLASS_Concurrency_Test.java fails sometimes, and on some machines. When running it individually using the maven wrapper and through the IDE, it passes consistently, which is why we believe it to be a concurrency issue.
The failure has the following output:
[ERROR] Failures:
[ERROR] SoftAssertionsExtension_PER_CLASS_Concurrency_Test.concurrent_tests_with_explicit_per_class_annotation_do_not_interfere:116 Test Event Statistics (2 failures)
org.opentest4j.AssertionFailedError: started ==> expected: <2> but was: <0>
org.opentest4j.AssertionFailedError: failed ==> expected: <2> but was: <0>
We identified that the change that introduced the flakiness is this line: https://github.com/assertj/assertj/commit/bcb4e14275da62fcb26e2d1e5608c3ddd28020b8#diff-1fd14641c6731a83101088806368089b7af173b3a7424a504654008cab1a0dddR411
Here it was changed from using the deprecated JUnit method getOrComputeIfAbsent to computeIfAbsent (according to the JUnit Javadoc). Reverting the change in this line only already resolves the flakiness.
The test in SoftAssertionsExtension_PER_CLASS_Concurrency_Test.java executes soft assertions concurrently, thus also executing getOrComputeIfAbsent/computeIfAbsent concurrently, which suggests that there is a concurrency issue in computeIfAbsent that wasn't in getOrComputeWithAbsent.
Analysis
We noticed one of the substantial changes from the getOrComputeIfAbsent implementation to the computeIfAbsent implementation is that the call on the ConcurrentMap was changed from computeIfAbsent to compute, see:
https://github.com/junit-team/junit-framework/blob/19557e4d5d3a504b02eaee8c5562c31c2d491b10/junit-platform-engine/src/main/java/org/junit/platform/engine/support/store/NamespacedHierarchicalStore.java#L247
Our hypothesis is, that if two threads are executing NamespacedHierarchicalStore#computeIfAbsent in parallel, then if both threads are executing at the same speed where both end up executing this.storedValues.compute(... at the same time, the first thread will execute properly, while the second thread will overwrite the value in the Map. Previously in getOrComputeIfAbsent, since this.storedValues.computeIfAbsent was used, it would guarantee that the second thread would not overwrite the value, as the ConcurrentMap is enforcing synchronization properly.
We were able to verify that temporarily changing the offending line in NamespacedHierarchicalStore#computeIfAbsent to use computeIfAbsent instead of compute, and then publishing it to Maven local, using that build in AssertJ reliably resolves the flakiness even when running the tests in parallel and using getStore(context).computeIfAbsent in https://github.com/assertj/assertj/commit/bcb4e14275da62fcb26e2d1e5608c3ddd28020b8#diff-1fd14641c6731a83101088806368089b7af173b3a7424a504654008cab1a0dddR411
You can find this quick and dirty change we made where the issue no longer occurred here: https://github.com/danaebu/junit-framework/commit/9384b94e27ee1ee444b186bb5ff8185f57c71919
See related issue on the flakiness in AssertJ here: https://github.com/assertj/assertj/issues/1996
Context
- Used versions (Jupiter/Vintage/Platform): 6.0.1
- Build Tool/IDE: Maven/IntelliJ (but can be reproduced outside the IDE by using the maven commands on the terminal as well)
As a ConcurrentMap is used, we assume the implementation NamespacedHierarchicalStore#computeIfAbsent is supposed to be thread-safe.
Was found together with @martinfrancois at Hack Commit Push
Deliverables
- [ ]
NamespacedHierarchicalStore#computeIfAbsentis thread-safe - [ ] Unit test is added to JUnit as well that reproduces this issue reliably to avoid regressions.
Clone https://github.com/assertj/assertj locally
As I reverted the upgrade to JUnit 6, please use assertj/assertj@f90a09a25ba23ee077dbd1719acd643c7fea4da0 to verify the issue.
Thanks for the detailed report!
Our hypothesis is, that if two threads are executing
NamespacedHierarchicalStore#computeIfAbsentin parallel, then if both threads are executing at the same speed where both end up executingthis.storedValues.compute(...at the same time, the first thread will execute properly, while the second thread will overwrite the value in the Map.
According to the Javadoc for ConcurrentHashMap, they can't be in compute concurrently as each update is performed atomically. 🤔
@danaebu @martinfrancois Does one of you have time to investigate further?
@marcphilipp Thanks for the quick reply!
You're absolutely right that ConcurrentHashMap guarantees that each compute call for a given key is atomic and that the remapping function for that key is not executed concurrently in multiple threads.
Our point is slightly different and more about the functional difference between computeIfAbsent and compute when they run concurrently on multiple threads:
computeIfAbsent(key, f)only invokesfwhen there is no mapping forkey. Under contention, for a given key, only one thread will runf; other threads will see the value that was already computed and return it.compute(key, remapper)invokes the remapping function every time, even when there is already a value. Under contention, multiple threads will be serialized, but each will still run the remapper and the last one wins, potentially overwriting the value initialized by an earlier thread.
A tiny example to illustrate this difference:
ConcurrentHashMap<String, String> map = new ConcurrentHashMap<>();
// Thread A
map.computeIfAbsent("k", k -> "A");
// Thread B
map.computeIfAbsent("k", k -> "B");
Possible outcome with computeIfAbsent:
- Thread A runs first: there is no mapping for
"k", so the mapping function is called and"A"is stored. - Thread B runs next:
"k"is now present, so the mapping function is not called and"A"is returned. - Final value for
"k"is"A"(or"B"if the threads are swapped), but crucially the mapping function is only executed once for that key.
Now the same idea with compute:
ConcurrentHashMap<String, String> map = new ConcurrentHashMap<>();
// Thread A
map.compute("k", (k, old) -> {
// old == null
return "A";
});
// Thread B
map.compute("k", (k, old) -> {
// old is now "A"
return "B";
});
Under contention, calls for "k" are serialized, but both remappers still run:
- Thread A runs first:
oldisnull, it returns"A", map now contains"k" -> "A". - Thread B runs next:
oldis"A", it returns"B", map now contains"k" -> "B".
Everything is atomic and thread safe, but the effect is that the second call overwrites the value created by the first. If the remapping function ignores old or reinitializes some state, you effectively reset or replace whatever the first thread did.
In NamespacedHierarchicalStore#computeIfAbsent, the implementation previously relied on ConcurrentMap.computeIfAbsent, which provides the "one logical initialization per key" behavior. After the change to storedValues.compute(…), every call to NamespacedHierarchicalStore.computeIfAbsent for the same key can rerun the initialization logic and replace the existing StoredValue.
That means that even though each compute call is atomic, two threads calling NamespacedHierarchicalStore.computeIfAbsent for the same key can:
- Have Thread A initialize the stored value and start tracking statistics.
- Then have Thread B rerun the initialization and replace that value, effectively resetting the statistics.
That matches what we see in the failing AssertJ test: the per-class statistics end up as 0 instead of 2, consistent with an initialization that was overwritten rather than a lack of atomicity.
I hope that makes it clear. Otherwise, please let me know!
FYI @marcphilipp I fixed this issue in https://github.com/junit-team/junit-framework/pull/5209
Our hypothesis is, that if two threads are executing NamespacedHierarchicalStore#computeIfAbsent in parallel, then if both threads are executing at the same speed where both end up executing this.storedValues.compute(... at the same time, the first thread will execute properly, while the second thread will overwrite the value in the Map.
In NamespacedHierarchicalStore#computeIfAbsent, the implementation previously relied on ConcurrentMap.computeIfAbsent, which provides the "one logical initialization per key" behavior. After the change to storedValues.compute(…), every call to NamespacedHierarchicalStore.computeIfAbsent for the same key can rerun the initialization logic and replace the existing StoredValue.
@danaebu @martinfrancois I think we can rule this one out because of
https://github.com/junit-team/junit-framework/blob/c9454e3cff2b9d9423ffc309c097a3a55c5b6788/junit-platform-engine/src/main/java/org/junit/platform/engine/support/store/NamespacedHierarchicalStore.java#L248-L251
The defaultCreator is not applied until after the oldStoredValue has been checked. So when defaultCreator is applied for a given key a value was either not set at all or that value was set and set not null. So on the face of it the defaultCreator should be applied at most once.
There is however a more likely cause for flakiness. SoftAssertionsExtension calls ExtensionContext.Store.computeIfAbsent with a method that also updates the store. This is not permitted by the underlying ConcurrentHashMap.
Specifically SoftAssertionsExtension.getSoftAssertionsProvider uses instantiateProvider to create the stored value. And in turn instantiateProvider calls either getThreadLocalCollector or getAssertionErrorCollector and getSoftAssertionsProviders which all update the context store. This occasionally results in a test failure containing a IllegalStateException("Recursive update") which is recorded by the test kit, but goes otherwise unnoticed.
The failures can be printed by applying this patch to https://github.com/assertj/assertj/commit/f90a09a25ba23ee077dbd1719acd643c7fea4da0
diff --git a/assertj-tests/assertj-integration-tests/assertj-core-tests/src/test/java/org/assertj/tests/core/api/junit/jupiter/SoftAssertionsExtension_PER_CLASS_Concurrency_Test.java b/assertj-tests/assertj-integration-tests/assertj-core-tests/src/test/java/org/assertj/tests/core/api/junit/jupiter/SoftAssertionsExtension_PER_CLASS_Concurrency_Test.java
index 3e944bb40..79501b6d1 100644
--- a/assertj-tests/assertj-integration-tests/assertj-core-tests/src/test/java/org/assertj/tests/core/api/junit/jupiter/SoftAssertionsExtension_PER_CLASS_Concurrency_Test.java
+++ b/assertj-tests/assertj-integration-tests/assertj-core-tests/src/test/java/org/assertj/tests/core/api/junit/jupiter/SoftAssertionsExtension_PER_CLASS_Concurrency_Test.java
@@ -22,6 +22,7 @@ import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
+import java.util.Optional;
import java.util.concurrent.CountDownLatch;
import org.assertj.core.api.AssertionErrorCollector;
@@ -40,9 +41,11 @@ import org.junit.jupiter.api.extension.ExtendWith;
import org.junit.jupiter.api.extension.ExtensionContext;
import org.junit.jupiter.api.parallel.Execution;
import org.junit.jupiter.api.parallel.ExecutionMode;
+import org.junit.platform.engine.TestExecutionResult;
+import org.junit.platform.testkit.engine.EngineExecutionResults;
import org.junit.platform.testkit.engine.EngineTestKit;
+import org.junit.platform.testkit.engine.EventType;
-@Disabled("gh-1996")
@DisplayName("SoftAssertionsExtension PER_CLASS concurrent injection test")
class SoftAssertionsExtension_PER_CLASS_Concurrency_Test {
// Use CountDownLatches to synchronize between the two parallel running tests to make sure that they overlap in time.
@@ -107,15 +110,25 @@ class SoftAssertionsExtension_PER_CLASS_Concurrency_Test {
@Test
void concurrent_tests_with_explicit_per_class_annotation_do_not_interfere() {
- EngineTestKit.engine("junit-jupiter")
- .selectors(selectClass(ConcurrencyTest.class))
- .configurationParameter("junit.jupiter.conditions.deactivate", "*")
- .configurationParameter("junit.jupiter.execution.parallel.enabled", "true")
- .execute()
- .testEvents()
- .debug(System.err)
- .assertStatistics(stats -> stats.started(2).succeeded(0).failed(2))
- .failed();
+ EngineExecutionResults result = EngineTestKit.engine("junit-jupiter")
+ .selectors(selectClass(ConcurrencyTest.class))
+ .configurationParameter("junit.jupiter.conditions.deactivate", "*")
+ .configurationParameter("junit.jupiter.execution.parallel.enabled", "true")
+ .execute();
+
+ result.allEvents().stream()
+ .filter(event -> event.getType() == EventType.FINISHED)
+ .map(event -> event.getRequiredPayload(TestExecutionResult.class))
+ .map(TestExecutionResult::getThrowable)
+ .filter(Optional::isPresent)
+ .map(Optional::get)
+ .forEach(System.out::println);
+
+ result
+ .testEvents()
+ .debug(System.err)
+ .assertStatistics(stats -> stats.started(2).succeeded(0).failed(2))
+ .failed();
try (AutoCloseableSoftAssertions softly = new AutoCloseableSoftAssertions()) {
AssertionErrorCollector collector = ConcurrencyTest.map.get("test1");
What is interesting is that both ConcurrentHashMap.computeIfAbsent and compute both do not permit modification, but that this is only a problem with NamespacedHierarchicalStore.compute and not computeIfAbsent . The use of the MemoizingSupplier in computeIfAbsent prevents the recursive update of the map.
And this poses a problem. As observed in https://github.com/junit-team/junit-framework/pull/5209#discussion_r2611043629, the MemoizingSupplier can not be used with compute without leaking state.
@marcphilipp current options:
- Document
NamespacedHierarchicalStore.computeIfAbsentshould not update the store. - Use a
MemoizingSupplierto avoid recursive updates, and synchronize access toNamespacedHierarchicalStore.storedValues. - Undeprecated
NamespacedHierarchicalStore.getOrComputeIfAbsent - Check if the
MemoizingSupplierfailed before returning the value fromgetgetOrComputeIfAbsent, ect. - 🤷
@mpkorstanje Thanks, I only saw this issue comment now; I didn’t see it before I started iterating on the PR.
I agree we can likely rule out the "same-key overwrite" hypothesis as the primary cause (as you pointed out defaultCreator is invoked only after checking oldStoredValue).
The "Recursive update" angle matches what I observed while digging into this, and it also explains why "quickly switching to ConcurrentHashMap.computeIfAbsent" appeared to help in AssertJ: computeIfAbsent + MemoizingSupplier keep the mapping function side-effect-free (no store updates happen inside the CHM remapping function), so you don't hit the CHM recursive-update guard.
The updated PR takes the same fundamental direction: ensure defaultCreator is never executed inside a ConcurrentHashMap.compute*(...) remapping function. Instead we install a deferred placeholder first and run defaultCreator afterwards. This should avoid both:
- bucket-lock blocking/deadlock scenarios, and
- the
IllegalStateException("Recursive update")failures caused by store updates occurring from within the remapping function.
I also added regression tests that fail reliably if defaultCreator is executed within the map remapping function (including a colliding-keys race test and tests ensuring get() cannot observe transient failures from a failed computeIfAbsent).
@mpkorstanje As a follow-up, I tried to reproduce exactly what you described in your comment in a local AssertJ checkout.
What I did
- Cloned
assertj/assertjand checked out the commit you referenced (assertj/assertj@f90a09a). - Applied your suggested patch to
SoftAssertionsExtension_PER_CLASS_Concurrency_Test(removed@Disabled, printedFINISHEDevent throwables). - Ran the test with AssertJ’s baseline JUnit dependency (
junit-jupiter.version=6.0.1).
Baseline result (JUnit 6.0.1)
- The test reliably prints
java.lang.IllegalStateException: Recursive update. - And the EngineTestKit statistics show the same symptom as in the issue report (
started(0)/failed(0)), i.e. the internalRecursive updateis recorded by the testkit but otherwise hidden.
Result with my PR
- I published my JUnit build to Maven Local (
6.1.0-SNAPSHOT) and re-ran the same patched AssertJ test using-Djunit-jupiter.version=6.1.0-SNAPSHOT. - With both implementation variants I currently have on the PR (Approach 1 and Approach 2), the test no longer prints
Recursive update, and the EngineTestKit statistics are stable (started(2)), with only the expected assertion failures from the two tests.
So this PR appears to address the failure mode you identified (store updates from within a CHM remapping function leading to Recursive update), in addition to preventing bucket-level blocking/deadlock scenarios.