fix bug of warm index : FullFileCachedIndexInput was repeatedly closed
Description
check closed value in function FullFileCachedIndexInput.IndexInputHolder.run(),prevent FileCachedIndexInput closed repeatedly and duplicate deref the reference count of cache entry
Related Issues
Resolves #20054 https://github.com/opensearch-project/OpenSearch/issues/20054
Check List
- [ ] Functionality includes testing.
- [ ] API changes companion pull request created, if applicable.
- [ ] Public documentation issue/PR created, if applicable.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Summary by CodeRabbit
-
Bug Fixes
- Fixed an issue in warm index caching where cached index inputs could be incorrectly marked closed, improving lifecycle handling and ensuring proper resource cleanup for cached index clones.
-
Tests
- Added end-to-end lifecycle/GC test coverage to validate clone/slice closing and cleanup behavior.
✏️ Tip: You can customize this high-level summary in your review settings.
:x: Gradle check result for dacd3a5a30eeb733a8a41173c0a93607199c756c: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:grey_exclamation: Gradle check result for c3fc0f2d6e7d32f2f9ab563111125366e7a3097f: UNSTABLE
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.
Codecov Report
:x: Patch coverage is 85.71429% with 2 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 73.26%. Comparing base (1aed472) to head (033d8f2).
:warning: Report is 6 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #20055 +/- ##
============================================
- Coverage 73.29% 73.26% -0.03%
+ Complexity 71780 71743 -37
============================================
Files 5795 5795
Lines 328297 328302 +5
Branches 47282 47283 +1
============================================
- Hits 240612 240523 -89
- Misses 68368 68431 +63
- Partials 19317 19348 +31
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
:x: Gradle check result for ef574256d5cd6562ba7b0607c8b26f92bf6641fc: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:white_check_mark: Gradle check result for 844e7beabe5ef4646a7423184435d57e0f8a8fea: SUCCESS
Walkthrough
Converted the closed flag from a volatile boolean to an AtomicBoolean across FileCachedIndexInput and FullFileCachedIndexInput, updated close/cleaner logic and IndexInputHolder to use the AtomicBoolean, added indexInputHolderRun() for explicit cleaner invocation, and added a test exercising clone/close/cleaner interactions. (50 words)
Changes
| Cohort / File(s) | Summary |
|---|---|
Changelog CHANGELOG.md |
Added a "Fixed" entry: "Fix bug of warm index: FullFileCachedIndexInput was closed error". |
Closed flag atomicity server/src/main/java/org/opensearch/index/store/remote/filecache/FileCachedIndexInput.java |
Replaced protected volatile boolean closed with protected final AtomicBoolean closed = new AtomicBoolean(false) and updated reads/writes to atomic operations; added import for AtomicBoolean. |
Lifecycle & cleaner coordination server/src/main/java/org/opensearch/index/store/remote/filecache/FullFileCachedIndexInput.java |
Propagated AtomicBoolean into IndexInputHolder constructor, switched close() and cleaner Runnable to use closed.get()/closed.set(...), added guards to avoid duplicate closures, and added public void indexInputHolderRun() to trigger the cleaner-held runnable. |
Tests server/src/test/java/org/opensearch/index/store/remote/filecache/FullFileCachedIndexInputTests.java |
Added public void testClose() throws IOException to validate clone lifecycle, reference counting progression, explicit close behavior, and GC-like cleaner invocations via indexInputHolderRun(). |
Sequence Diagram
sequenceDiagram
participant Test
participant FullInput as FullFileCachedIndexInput
participant Holder as IndexInputHolder
participant Cleaner as CleanerRunnable
participant Atomic as closed (AtomicBoolean)
Note over Test,FullInput: Clone/close lifecycle with cleaner coordination
Test->>FullInput: clone() x3
FullInput->>Atomic: closed.get() -> false
FullInput->>FullInput: increment refCount for clones
Test->>FullInput: clone1.close()
FullInput->>Atomic: closed.get() -> false
FullInput->>Atomic: closed.set(true)
FullInput->>Holder: schedule/hold cleanup
Holder->>Cleaner: run()
Cleaner->>Atomic: closed.get()
alt not already cleaned
Cleaner->>FullInput: deRef() (decrement refCount)
Cleaner->>Atomic: closed.set(true)
end
Test->>FullInput: indexInputHolderRun() on clone1
FullInput->>Holder: run() (explicit)
Holder->>Atomic: closed.get() -> true
Holder->>Cleaner: no-op if already cleaned
Test->>FullInput: close remaining clones -> refCount -> 0
Cleaner->>FullInput: final cleanup when refs reach 0
Estimated code review effort
🎯 3 (Moderate) | ⏱️ ~25 minutes
- Verify AtomicBoolean is used consistently for visibility and idempotence across all close paths.
- Inspect Cleaner Runnable guard logic and interactions with
indexInputHolderRun(). - Check reference counting correctness when clones are closed vs cleaned by the cleaner.
- Confirm the new
indexInputHolderRun()method is intended and safe to expose (test usage).
Poem
🐇 A flag once fickle, now atomic and true,
Clones hop in, cleaners tidy the queue.
No double-close tumble, no racing fright,
Threads settle down in the soft moonlight,
Rabbit hums — everything’s snug tonight.
Pre-merge checks and finishing touches
❌ Failed checks (1 warning)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Docstring Coverage | ⚠️ Warning | Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. | You can run @coderabbitai generate docstrings to improve docstring coverage. |
✅ Passed checks (2 passed)
| Check name | Status | Explanation |
|---|---|---|
| Title check | ✅ Passed | The title accurately summarizes the main fix: preventing repeated closure of FullFileCachedIndexInput by using AtomicBoolean for thread-safe state management. |
| Description check | ✅ Passed | The description covers the core fix and references the related issue (#20054), but lacks detail about implementation approach, test coverage, and reasoning. |
✨ Finishing touches
🧪 Generate unit tests (beta)
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
📜 Recent review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📥 Commits
Reviewing files that changed from the base of the PR and between 0c0f95585441297e09e72542c77f348f2b6ff436 and 033d8f2d7f674e379c964e9b97202d424e20d6e2.
📒 Files selected for processing (1)
CHANGELOG.md(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- CHANGELOG.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: gradle-check
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: assemble (21, ubuntu-24.04-arm)
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: detect-breaking-change
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: Analyze (java)
- GitHub Check: Mend Security Check
Comment @coderabbitai help to get the list of available commands and usage tips.
:white_check_mark: Gradle check result for 7014a56112cabf88d2fd18b753b5b14b90ab49e0: SUCCESS
@gbbafna can you please review and merge this if it looks good
:x: Gradle check result for 0c0f95585441297e09e72542c77f348f2b6ff436: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
Changes LGTM . Thanks @liuyonghengheng for this and @rayshrey for the review. Will merge once the build passes .
:white_check_mark: Gradle check result for 0c0f95585441297e09e72542c77f348f2b6ff436: SUCCESS
:x: Gradle check result for 033d8f2d7f674e379c964e9b97202d424e20d6e2: null
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 033d8f2d7f674e379c964e9b97202d424e20d6e2: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:white_check_mark: Gradle check result for 033d8f2d7f674e379c964e9b97202d424e20d6e2: SUCCESS