armeria icon indicating copy to clipboard operation
armeria copied to clipboard

Fix race condition when initializing HealthCheckedEndpointGroup

Open hyunw9 opened this issue 9 months ago • 3 comments

Motivation

Fixes #6018

A race condition can occur when adding a listener immediately after creating a HealthCheckedEndpointGroup. Because the endpoint group initialization is asynchronous, the listener may receive endpoints in an incomplete or intermediate state.

Modification

Explicitly await the completion of HealthCheckedEndpointGroup initialization using whenReady().join() before adding a listener. This ensures the listener will always receive a fully initialized endpoint list, eliminating any race conditions.

Result • Race condition is fixed. • Listener always receives a fully initialized EndpointGroup, ensuring reliable and predictable behavior.

hyunw9 avatar Mar 31 '25 14:03 hyunw9

🔍 Build Scan® (commit: e3d98c86c4e66c9d1d495d4841ce84249bb9288f)

Job name Status Build Scan®
build-ubicloud-standard-16-jdk-8 https://ge.armeria.dev/s/r4aegzhoauwj4
build-ubicloud-standard-16-jdk-21-snapshot-blockhound https://ge.armeria.dev/s/dua4ygg57ekls
build-ubicloud-standard-16-jdk-17-min-java-17-coverage https://ge.armeria.dev/s/p6s2a36prpbt4
build-ubicloud-standard-16-jdk-17-min-java-11 https://ge.armeria.dev/s/f5nwmv6my2lpy
build-ubicloud-standard-16-jdk-17-leak https://ge.armeria.dev/s/ldo3jgj7f4wjc
build-ubicloud-standard-16-jdk-11 https://ge.armeria.dev/s/qjt6e2ci5nvke
build-macos-latest-jdk-21 https://ge.armeria.dev/s/fxyetlbsohuwc

github-actions[bot] avatar Mar 31 '25 14:03 github-actions[bot]

I noticed that the test failure in HealthCheckedEndpointGroupTest#cacheReflectsAttributeChanges seemed to stem from an unexpected ordering issue during initialization.

In this test, registering the endpoint is essentially part of the setup phase. However, the internal initialization of HealthCheckedEndpointGroup is asynchronous, and addListener(..., true) can synchronously invoke listener.accept(latest) before the system is ready, which introduces a race condition.

Initially, I thought calling whenReady().join() would fully synchronize the group’s initialization. However, after digging into the code, I realized that although join() ensures the group is initialized, it doesn’t control the timing of the listener callback.

Internally, addListener() acquires a lock to update the listener list, but the actual call to listener.accept(latest) (triggered by notifyLatestValue = true) occurs outside the lock. That means a race condition can still occur between listener.accept(...) and the external setup logic that expects a stable state.

To address this, I changed the approach: 1. I registered the listener using addListener(endpointListener, false) to avoid immediate invocation. 2. Then, after the group was fully initialized via whenReady().join(), I manually called listener.accept(...) to reflect the initial state in a safe and deterministic manner.

This ensures that the state propagation is explicitly controlled and prevents any premature callback from breaking the test setup.

hyunw9 avatar Apr 01 '25 14:04 hyunw9

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 74.08%. Comparing base (8150425) to head (e3d98c8). :warning: Report is 232 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #6188      +/-   ##
============================================
- Coverage     74.46%   74.08%   -0.38%     
- Complexity    22234    22992     +758     
============================================
  Files          1963     2061      +98     
  Lines         82437    86135    +3698     
  Branches      10764    11312     +548     
============================================
+ Hits          61385    63815    +2430     
- Misses        15918    16896     +978     
- Partials       5134     5424     +290     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar Apr 09 '25 13:04 codecov[bot]