Fix race condition when initializing HealthCheckedEndpointGroup
Motivation
Fixes #6018
A race condition can occur when adding a listener immediately after creating a HealthCheckedEndpointGroup. Because the endpoint group initialization is asynchronous, the listener may receive endpoints in an incomplete or intermediate state.
Modification
Explicitly await the completion of HealthCheckedEndpointGroup initialization using whenReady().join() before adding a listener. This ensures the listener will always receive a fully initialized endpoint list, eliminating any race conditions.
Result • Race condition is fixed. • Listener always receives a fully initialized EndpointGroup, ensuring reliable and predictable behavior.
🔍 Build Scan® (commit: e3d98c86c4e66c9d1d495d4841ce84249bb9288f)
| Job name | Status | Build Scan® |
|---|---|---|
| build-ubicloud-standard-16-jdk-8 | ✅ | https://ge.armeria.dev/s/r4aegzhoauwj4 |
| build-ubicloud-standard-16-jdk-21-snapshot-blockhound | ✅ | https://ge.armeria.dev/s/dua4ygg57ekls |
| build-ubicloud-standard-16-jdk-17-min-java-17-coverage | ✅ | https://ge.armeria.dev/s/p6s2a36prpbt4 |
| build-ubicloud-standard-16-jdk-17-min-java-11 | ✅ | https://ge.armeria.dev/s/f5nwmv6my2lpy |
| build-ubicloud-standard-16-jdk-17-leak | ✅ | https://ge.armeria.dev/s/ldo3jgj7f4wjc |
| build-ubicloud-standard-16-jdk-11 | ✅ | https://ge.armeria.dev/s/qjt6e2ci5nvke |
| build-macos-latest-jdk-21 | ✅ | https://ge.armeria.dev/s/fxyetlbsohuwc |
I noticed that the test failure in HealthCheckedEndpointGroupTest#cacheReflectsAttributeChanges seemed to stem from an unexpected ordering issue during initialization.
In this test, registering the endpoint is essentially part of the setup phase. However, the internal initialization of HealthCheckedEndpointGroup is asynchronous, and addListener(..., true) can synchronously invoke listener.accept(latest) before the system is ready, which introduces a race condition.
Initially, I thought calling whenReady().join() would fully synchronize the group’s initialization. However, after digging into the code, I realized that although join() ensures the group is initialized, it doesn’t control the timing of the listener callback.
Internally, addListener() acquires a lock to update the listener list, but the actual call to listener.accept(latest) (triggered by notifyLatestValue = true) occurs outside the lock. That means a race condition can still occur between listener.accept(...) and the external setup logic that expects a stable state.
To address this, I changed the approach:
1. I registered the listener using addListener(endpointListener, false) to avoid immediate invocation.
2. Then, after the group was fully initialized via whenReady().join(), I manually called listener.accept(...) to reflect the initial state in a safe and deterministic manner.
This ensures that the state propagation is explicitly controlled and prevents any premature callback from breaking the test setup.
Codecov Report
:white_check_mark: All modified and coverable lines are covered by tests.
:white_check_mark: Project coverage is 74.08%. Comparing base (8150425) to head (e3d98c8).
:warning: Report is 232 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #6188 +/- ##
============================================
- Coverage 74.46% 74.08% -0.38%
- Complexity 22234 22992 +758
============================================
Files 1963 2061 +98
Lines 82437 86135 +3698
Branches 10764 11312 +548
============================================
+ Hits 61385 63815 +2430
- Misses 15918 16896 +978
- Partials 5134 5424 +290
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.