[merged segment warmer] Introduce merged segment checkpoint retention to avoid listAll in computeReferencedSegmentsCheckpoint
Description
This PR is based on [18720]'s follow-up work. Based on the discussion with @ashking94, I made some optimizations. The main purpose is to avoid the listAll operation in IndexShard#computeReferencedSegmentsCheckpoint.
The main changes in this PR are as follows:
- I Introduced
IndexShard#primaryMergedSegmentCheckpointsfor primary shard to track merged segment checkpoints that have been published for pre-warm but not yet refreshed. We will add merged segment checkpoint toIndexShard#primaryMergedSegmentCheckpointsbeforepublish_merged_segment. - Rename
IndexShard#pendingMergedSegmentCheckpointstoIndexShard#replicaMergedSegmentCheckpoints, which is used for replica shard record the pre-copied merged segment checkpoints, which are not yet refreshed. - When
ReplicationCheckpointUpdater#afterRefreshis invoked, remove refreshed segment fromIndexShard#primaryMergedSegmentCheckpoints. The reason why it cannot be executed inPublishCheckpointActioncan refer to the discussion in issue #18845. - To avoid memory leaks in
IndexShard#primaryMergedSegmentCheckpoints, I introduced a configurationIndexSettings#INDEX_MERGED_SEGMENT_CHECKPOINT_RETENTION_TIME(default15minutes) to clean up expired checkpoints afterpublish_referenced_segments. - I introduced
MergedSegmentWarmerIT#testPrimaryMergedSegmentCheckpointRetentionTimeout. Construct a case with redundant merged segment checkpoint in the primary shard and delete it based on the expiration time.
Related Issues
Resolves #[18845]
Check List
- [x] Functionality includes testing.
- [x] API changes companion pull request created, if applicable.
- [x] Public documentation issue/PR created, if applicable.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Summary by CodeRabbit
-
New Features
- Added a configurable merged-segment checkpoint retention time (default 15 minutes).
-
Improvements
- Separate tracking for primary vs. replica merged-segment checkpoints.
- Enhanced cleanup, expiration, publishing and synchronization flows to better keep primary and replica files in sync.
-
Tests
- New and updated tests covering checkpoint retention/expiration and primary–replica file synchronization.
✏️ Tip: You can customize this high-level summary in your review settings.
:x: Gradle check result for 0c063022e3fe4872a09051da1f91901c59664ad0: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 253c15f178beb622067d6f49b077080f08ae82e5: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 1c90ffca51f130efbeb4b62fa9678eeaf1986e34: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 9ba9cb68b479db8a2b4ba649c1c505b877184673: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 742b545d96739107f2f9bbaea9560c99503d1bca: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:white_check_mark: Gradle check result for 762f707ff6dd236289c1d7da4a9adcb57bd4429d: SUCCESS
Hi, @ashking94 If you have time, please help review this PR.
Hi, @ashking94
I have merged the latest code from the main branch and resolved the conflicts. I noticed that the merged segment warmer of the remote store has been merged into the main branch. After this PR, I will adapt this part of the code to the remote store.
Looking forward to your reply:)
:x: Gradle check result for 467872ff5a49363938ef454fb226beadb7b33ca9: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:white_check_mark: Gradle check result for 137195f81c7c1e3ce8c77f7bfa8bf53c8a320834: SUCCESS
Codecov Report
:x: Patch coverage is 86.11111% with 10 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 73.32%. Comparing base (66ed5cb) to head (a11f367).
:warning: Report is 3 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #18890 +/- ##
============================================
+ Coverage 72.82% 73.32% +0.49%
- Complexity 71315 71841 +526
============================================
Files 5795 5795
Lines 328297 328350 +53
Branches 47282 47282
============================================
+ Hits 239089 240751 +1662
+ Misses 69893 68302 -1591
+ Partials 19315 19297 -18
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Hi, @ashking94 Could you please help review this PR? :)
@guojialiang92 I was out of office for couple of weeks. I will prioritise this over coming days.
@guojialiang92 Bumping this up..
Hey @guojialiang92 , bumping this up again.
Thank you very much for helping to review the code. @ashking94 :)
Hey @guojialiang92 , bumping this up again.
I'm sorry for the late reply. I will start updating this PR soon.
Walkthrough
Split merged-segment checkpoint handling into distinct primary and replica sets; add shard APIs to record, query, expire, and clear those checkpoints; wire a retention setting into cluster settings; propagate segment-replication checkpoints into shard state; clear replica state on close/promotion; and extend tests for retention and synchronization.
Changes
| Cohort / File(s) | Summary |
|---|---|
Index shard checkpoint APIs & core logic server/src/main/java/org/opensearch/index/shard/IndexShard.java |
Add primaryMergedSegmentCheckpoints, replicaMergedSegmentCheckpoints, currentSegmentReplicationCheckpoint and a cleanup mutex; new public APIs to add/get/expire/clear checkpoints and set current replication checkpoint; integrate checkpoint lifecycle into publish/merge/replication and cleanup paths; add expiration pruning. |
Replication target bookkeeping server/src/main/java/org/opensearch/indices/replication/MergedSegmentReplicationTarget.java |
In finalizeReplication, record replica merged-segment checkpoints via indexShard.addReplicaMergedSegmentCheckpoint(...) (replaces pending API call). |
Segment replication checkpoint propagation server/src/main/java/org/opensearch/indices/replication/SegmentReplicationTarget.java |
After applying checkpoint updates, call indexShard.setCurrentSegmentReplicationCheckpoint(checkpoint) to synchronize shard state. |
Replica lifecycle cleanup server/src/main/java/org/opensearch/indices/replication/SegmentReplicationTargetService.java |
Call clearReplicaMergedSegmentState() before shard close and on replica→primary promotion to clear replica merged-segment state. |
Retention setting & cluster wiring server/src/main/java/org/opensearch/indices/recovery/RecoverySettings.java, server/src/main/java/org/opensearch/common/settings/ClusterSettings.java |
Add dynamic INDICES_MERGED_SEGMENT_CHECKPOINT_RETENTION_TIME (default 15m) with update consumer and getter; register it in built-in cluster settings. |
Publish/replication integration server/src/main/java/org/opensearch/index/shard/IndexShard.java (publish paths) |
publishMergedSegment(...) now registers a primary merged-segment checkpoint before publishing and rolls back on failure; computeReferencedSegmentsCheckpoint() and publishReferencedSegments() include primary checkpoints and prune expired ones. |
Source handler variable rename server/src/main/java/org/opensearch/indices/replication/SegmentReplicationSourceService.java |
Rename local variable in CheckpointInfoRequestHandler (segmentSegmentFileChunkWriter → remoteSegmentFileChunkWriter) and pass it to prepareForReplication; no behavior change. |
Tests — integration & unit updates server/src/internalClusterTest/java/org/opensearch/indices/replication/MergedSegmentWarmerIT.java, server/src/test/java/org/opensearch/index/shard/SegmentReplicationIndexShardTests.java, server/src/test/java/org/opensearch/index/shard/RemoteIndexShardTests.java |
Rename tests (testCleanupRedundantPendingMerge* → testCleanupReplicaRedundantMergedSegment); add testPrimaryMergedSegmentCheckpointRetentionTimeout() and helper waitForSameFilesInPrimaryAndReplica(); replace pending-checkpoint accessors with primary/replica checkpoint accessors; adjust warmer thresholds and assertions. |
Test framework constructor change test/framework/src/main/java/org/opensearch/index/shard/IndexShardTestCase.java |
Pass clusterService (was null) to MergedSegmentWarmerFactory during shard creation to provide cluster context in tests. |
sequenceDiagram
autonumber
participant Primary as Primary IndexShard
participant Publisher as Publish flow
participant Replicator as MergedSegmentReplicationTarget
participant Replica as Replica IndexShard
participant Settings as RecoverySettings
Publisher->>Primary: publishMergedSegment(segmentCommitInfo)
Primary->>Primary: add primaryMergedSegmentCheckpoint(...)
Primary->>Replicator: send files + merged checkpoint
Replicator->>Replica: finalizeReplication(..., mergedCheckpoint)
Replica->>Replica: addReplicaMergedSegmentCheckpoint(mergedCheckpoint)
Replicator->>Primary: update checkpoint ack
Primary->>Primary: setCurrentSegmentReplicationCheckpoint(checkpoint)
Settings->>Primary: retention time update / time elapses
Primary->>Primary: removeExpiredPrimaryMergedSegmentCheckpoints()
Estimated code review effort
🎯 4 (Complex) | ⏱️ ~50 minutes
- Areas needing extra attention:
- Concurrency and mutex usage around checkpoint collections and cleanup in
IndexShard.java. - Correctness and timing of
removeExpiredPrimaryMergedSegmentCheckpoints()relative to publish and replication flows. - Lifecycle hooks invoking
clearReplicaMergedSegmentState()inSegmentReplicationTargetService.java. - Tests that rely on timing/retention (
MergedSegmentWarmerIT,SegmentReplicationIndexShardTests) and the new cluster setting updates.
- Concurrency and mutex usage around checkpoint collections and cleanup in
Suggested labels
enhancement, Indexing, Indexing:Replication
Suggested reviewers
- ashking94
- andrross
- dbwiddis
- msfroh
- shwetathareja
- reta
- cwperks
- Bukhtawar
- owaiskazi19
- sachinpkale
- kotwanikunal
Poem
🐇 I marked each merge with a tiny hop,
Replicas learned where files should stop,
When time runs out old marks drift away,
I clear the crumbs at the start of day,
🥕 Hooray — checkpoints tidy, ready to play!
Pre-merge checks and finishing touches
❌ Failed checks (1 warning)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Docstring Coverage | ⚠️ Warning | Docstring coverage is 31.11% which is insufficient. The required threshold is 80.00%. | You can run @coderabbitai generate docstrings to improve docstring coverage. |
✅ Passed checks (2 passed)
| Check name | Status | Explanation |
|---|---|---|
| Title check | ✅ Passed | The title clearly and specifically describes the main change: introducing merged segment checkpoint retention to avoid listAll operations. |
| Description check | ✅ Passed | The description provides comprehensive details on changes made, main objectives, concurrent control design, scenario analysis, and includes testing and documentation checklist. |
✨ Finishing touches
- [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
Comment @coderabbitai help to get the list of available commands and usage tips.
:x: Gradle check result for dc0f22bff55ac0a98ba739647932d940fedf3797: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 77cdc1fe7343434c76202668e90f94f8a7474b2e: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:white_check_mark: Gradle check result for 7ecb2e0a03ca6e1a8a7226dcc335e08933f61072: SUCCESS
:white_check_mark: Gradle check result for bd276111b346f5afff35827e2fb8fe1c654557df: SUCCESS
:white_check_mark: Gradle check result for 89ff9acd84c1b3dbb589bb241eee455b003197ba: SUCCESS
:x: Gradle check result for c404414095511ee8178f4e823554bb214fd6479b: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 633246a1b75d40bdbd1d89bdd408c35c74f8600a: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:white_check_mark: Gradle check result for 10b01d002cf81578e0d7a551e2e52ea9151baef8: SUCCESS
:white_check_mark: Gradle check result for 85f9385589ac37528489e67d9f218e9a60a2e154: SUCCESS
:x: Gradle check result for bba568547351f6c6271f8a1aeee8c2d10f0aeb37: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 076413329aaabff935ddc72b56ff1055a86f22a0: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:white_check_mark: Gradle check result for f2e0d8a352763373d2f471c52361fefd8fd11d00: SUCCESS
:x: Gradle check result for e78533c3edc46abba1f6165287885c74312bce8a: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?