OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

[merged segment warmer] Introduce merged segment checkpoint retention to avoid listAll in computeReferencedSegmentsCheckpoint

Open guojialiang92 opened this issue 4 months ago • 22 comments

Description

This PR is based on [18720]'s follow-up work. Based on the discussion with @ashking94, I made some optimizations. The main purpose is to avoid the listAll operation in IndexShard#computeReferencedSegmentsCheckpoint.

The main changes in this PR are as follows:

  • I Introduced IndexShard#primaryMergedSegmentCheckpoints for primary shard to track merged segment checkpoints that have been published for pre-warm but not yet refreshed. We will add merged segment checkpoint to IndexShard#primaryMergedSegmentCheckpoints before publish_merged_segment.
  • Rename IndexShard#pendingMergedSegmentCheckpoints to IndexShard#replicaMergedSegmentCheckpoints, which is used for replica shard record the pre-copied merged segment checkpoints, which are not yet refreshed.
  • When ReplicationCheckpointUpdater#afterRefresh is invoked, remove refreshed segment from IndexShard#primaryMergedSegmentCheckpoints. The reason why it cannot be executed in PublishCheckpointAction can refer to the discussion in issue #18845.
  • To avoid memory leaks in IndexShard#primaryMergedSegmentCheckpoints, I introduced a configuration IndexSettings#INDEX_MERGED_SEGMENT_CHECKPOINT_RETENTION_TIME (default 15 minutes) to clean up expired checkpoints after publish_referenced_segments.
  • I introduced MergedSegmentWarmerIT#testPrimaryMergedSegmentCheckpointRetentionTimeout. Construct a case with redundant merged segment checkpoint in the primary shard and delete it based on the expiration time.

Related Issues

Resolves #[18845]

Check List

  • [x] Functionality includes testing.
  • [x] API changes companion pull request created, if applicable.
  • [x] Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

  • New Features

    • Added a configurable merged-segment checkpoint retention time (default 15 minutes).
  • Improvements

    • Separate tracking for primary vs. replica merged-segment checkpoints.
    • Enhanced cleanup, expiration, publishing and synchronization flows to better keep primary and replica files in sync.
  • Tests

    • New and updated tests covering checkpoint retention/expiration and primary–replica file synchronization.

✏️ Tip: You can customize this high-level summary in your review settings.

guojialiang92 avatar Aug 01 '25 07:08 guojialiang92

:x: Gradle check result for 0c063022e3fe4872a09051da1f91901c59664ad0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 01 '25 08:08 github-actions[bot]

:x: Gradle check result for 253c15f178beb622067d6f49b077080f08ae82e5: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 01 '25 10:08 github-actions[bot]

:x: Gradle check result for 1c90ffca51f130efbeb4b62fa9678eeaf1986e34: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 02 '25 15:08 github-actions[bot]

:x: Gradle check result for 9ba9cb68b479db8a2b4ba649c1c505b877184673: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 02 '25 17:08 github-actions[bot]

:x: Gradle check result for 742b545d96739107f2f9bbaea9560c99503d1bca: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 04 '25 03:08 github-actions[bot]

:white_check_mark: Gradle check result for 762f707ff6dd236289c1d7da4a9adcb57bd4429d: SUCCESS

github-actions[bot] avatar Aug 04 '25 05:08 github-actions[bot]

Hi, @ashking94 If you have time, please help review this PR.

guojialiang92 avatar Aug 04 '25 05:08 guojialiang92

Hi, @ashking94

I have merged the latest code from the main branch and resolved the conflicts. I noticed that the merged segment warmer of the remote store has been merged into the main branch. After this PR, I will adapt this part of the code to the remote store.

Looking forward to your reply:)

guojialiang92 avatar Aug 11 '25 15:08 guojialiang92

:x: Gradle check result for 467872ff5a49363938ef454fb226beadb7b33ca9: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 11 '25 16:08 github-actions[bot]

:white_check_mark: Gradle check result for 137195f81c7c1e3ce8c77f7bfa8bf53c8a320834: SUCCESS

github-actions[bot] avatar Aug 11 '25 18:08 github-actions[bot]

Codecov Report

:x: Patch coverage is 86.11111% with 10 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 73.32%. Comparing base (66ed5cb) to head (a11f367). :warning: Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
...in/java/org/opensearch/index/shard/IndexShard.java 86.44% 6 Missing and 2 partials :warning:
.../opensearch/indices/recovery/RecoverySettings.java 75.00% 2 Missing :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #18890      +/-   ##
============================================
+ Coverage     72.82%   73.32%   +0.49%     
- Complexity    71315    71841     +526     
============================================
  Files          5795     5795              
  Lines        328297   328350      +53     
  Branches      47282    47282              
============================================
+ Hits         239089   240751    +1662     
+ Misses        69893    68302    -1591     
+ Partials      19315    19297      -18     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Aug 11 '25 18:08 codecov[bot]

Hi, @ashking94 Could you please help review this PR? :)

guojialiang92 avatar Sep 08 '25 13:09 guojialiang92

@guojialiang92 I was out of office for couple of weeks. I will prioritise this over coming days.

ashking94 avatar Sep 29 '25 16:09 ashking94

@guojialiang92 Bumping this up..

ashking94 avatar Oct 13 '25 11:10 ashking94

Hey @guojialiang92 , bumping this up again.

ashking94 avatar Oct 22 '25 06:10 ashking94

Thank you very much for helping to review the code. @ashking94 :)

Hey @guojialiang92 , bumping this up again.

I'm sorry for the late reply. I will start updating this PR soon.

guojialiang92 avatar Dec 04 '25 03:12 guojialiang92

Walkthrough

Split merged-segment checkpoint handling into distinct primary and replica sets; add shard APIs to record, query, expire, and clear those checkpoints; wire a retention setting into cluster settings; propagate segment-replication checkpoints into shard state; clear replica state on close/promotion; and extend tests for retention and synchronization.

Changes

Cohort / File(s) Summary
Index shard checkpoint APIs & core logic
server/src/main/java/org/opensearch/index/shard/IndexShard.java
Add primaryMergedSegmentCheckpoints, replicaMergedSegmentCheckpoints, currentSegmentReplicationCheckpoint and a cleanup mutex; new public APIs to add/get/expire/clear checkpoints and set current replication checkpoint; integrate checkpoint lifecycle into publish/merge/replication and cleanup paths; add expiration pruning.
Replication target bookkeeping
server/src/main/java/org/opensearch/indices/replication/MergedSegmentReplicationTarget.java
In finalizeReplication, record replica merged-segment checkpoints via indexShard.addReplicaMergedSegmentCheckpoint(...) (replaces pending API call).
Segment replication checkpoint propagation
server/src/main/java/org/opensearch/indices/replication/SegmentReplicationTarget.java
After applying checkpoint updates, call indexShard.setCurrentSegmentReplicationCheckpoint(checkpoint) to synchronize shard state.
Replica lifecycle cleanup
server/src/main/java/org/opensearch/indices/replication/SegmentReplicationTargetService.java
Call clearReplicaMergedSegmentState() before shard close and on replica→primary promotion to clear replica merged-segment state.
Retention setting & cluster wiring
server/src/main/java/org/opensearch/indices/recovery/RecoverySettings.java, server/src/main/java/org/opensearch/common/settings/ClusterSettings.java
Add dynamic INDICES_MERGED_SEGMENT_CHECKPOINT_RETENTION_TIME (default 15m) with update consumer and getter; register it in built-in cluster settings.
Publish/replication integration
server/src/main/java/org/opensearch/index/shard/IndexShard.java (publish paths)
publishMergedSegment(...) now registers a primary merged-segment checkpoint before publishing and rolls back on failure; computeReferencedSegmentsCheckpoint() and publishReferencedSegments() include primary checkpoints and prune expired ones.
Source handler variable rename
server/src/main/java/org/opensearch/indices/replication/SegmentReplicationSourceService.java
Rename local variable in CheckpointInfoRequestHandler (segmentSegmentFileChunkWriterremoteSegmentFileChunkWriter) and pass it to prepareForReplication; no behavior change.
Tests — integration & unit updates
server/src/internalClusterTest/java/org/opensearch/indices/replication/MergedSegmentWarmerIT.java, server/src/test/java/org/opensearch/index/shard/SegmentReplicationIndexShardTests.java, server/src/test/java/org/opensearch/index/shard/RemoteIndexShardTests.java
Rename tests (testCleanupRedundantPendingMerge*testCleanupReplicaRedundantMergedSegment); add testPrimaryMergedSegmentCheckpointRetentionTimeout() and helper waitForSameFilesInPrimaryAndReplica(); replace pending-checkpoint accessors with primary/replica checkpoint accessors; adjust warmer thresholds and assertions.
Test framework constructor change
test/framework/src/main/java/org/opensearch/index/shard/IndexShardTestCase.java
Pass clusterService (was null) to MergedSegmentWarmerFactory during shard creation to provide cluster context in tests.
sequenceDiagram
    autonumber
    participant Primary as Primary IndexShard
    participant Publisher as Publish flow
    participant Replicator as MergedSegmentReplicationTarget
    participant Replica as Replica IndexShard
    participant Settings as RecoverySettings

    Publisher->>Primary: publishMergedSegment(segmentCommitInfo)
    Primary->>Primary: add primaryMergedSegmentCheckpoint(...)
    Primary->>Replicator: send files + merged checkpoint
    Replicator->>Replica: finalizeReplication(..., mergedCheckpoint)
    Replica->>Replica: addReplicaMergedSegmentCheckpoint(mergedCheckpoint)
    Replicator->>Primary: update checkpoint ack
    Primary->>Primary: setCurrentSegmentReplicationCheckpoint(checkpoint)
    Settings->>Primary: retention time update / time elapses
    Primary->>Primary: removeExpiredPrimaryMergedSegmentCheckpoints()

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

  • Areas needing extra attention:
    • Concurrency and mutex usage around checkpoint collections and cleanup in IndexShard.java.
    • Correctness and timing of removeExpiredPrimaryMergedSegmentCheckpoints() relative to publish and replication flows.
    • Lifecycle hooks invoking clearReplicaMergedSegmentState() in SegmentReplicationTargetService.java.
    • Tests that rely on timing/retention (MergedSegmentWarmerIT, SegmentReplicationIndexShardTests) and the new cluster setting updates.

Suggested labels

enhancement, Indexing, Indexing:Replication

Suggested reviewers

  • ashking94
  • andrross
  • dbwiddis
  • msfroh
  • shwetathareja
  • reta
  • cwperks
  • Bukhtawar
  • owaiskazi19
  • sachinpkale
  • kotwanikunal

Poem

🐇 I marked each merge with a tiny hop,

Replicas learned where files should stop,
When time runs out old marks drift away,
I clear the crumbs at the start of day,
🥕 Hooray — checkpoints tidy, ready to play!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 31.11% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: introducing merged segment checkpoint retention to avoid listAll operations.
Description check ✅ Passed The description provides comprehensive details on changes made, main objectives, concurrent control design, scenario analysis, and includes testing and documentation checklist.
✨ Finishing touches
  • [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Dec 08 '25 03:12 coderabbitai[bot]

:x: Gradle check result for dc0f22bff55ac0a98ba739647932d940fedf3797: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Dec 08 '25 03:12 github-actions[bot]

:x: Gradle check result for 77cdc1fe7343434c76202668e90f94f8a7474b2e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Dec 08 '25 14:12 github-actions[bot]

:white_check_mark: Gradle check result for 7ecb2e0a03ca6e1a8a7226dcc335e08933f61072: SUCCESS

github-actions[bot] avatar Dec 09 '25 06:12 github-actions[bot]

:white_check_mark: Gradle check result for bd276111b346f5afff35827e2fb8fe1c654557df: SUCCESS

github-actions[bot] avatar Dec 09 '25 18:12 github-actions[bot]

:white_check_mark: Gradle check result for 89ff9acd84c1b3dbb589bb241eee455b003197ba: SUCCESS

github-actions[bot] avatar Dec 10 '25 08:12 github-actions[bot]

:x: Gradle check result for c404414095511ee8178f4e823554bb214fd6479b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Dec 10 '25 14:12 github-actions[bot]

:x: Gradle check result for 633246a1b75d40bdbd1d89bdd408c35c74f8600a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Dec 11 '25 17:12 github-actions[bot]

:white_check_mark: Gradle check result for 10b01d002cf81578e0d7a551e2e52ea9151baef8: SUCCESS

github-actions[bot] avatar Dec 12 '25 04:12 github-actions[bot]

:white_check_mark: Gradle check result for 85f9385589ac37528489e67d9f218e9a60a2e154: SUCCESS

github-actions[bot] avatar Dec 12 '25 09:12 github-actions[bot]

:x: Gradle check result for bba568547351f6c6271f8a1aeee8c2d10f0aeb37: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Dec 12 '25 11:12 github-actions[bot]

:x: Gradle check result for 076413329aaabff935ddc72b56ff1055a86f22a0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Dec 12 '25 14:12 github-actions[bot]

:white_check_mark: Gradle check result for f2e0d8a352763373d2f471c52361fefd8fd11d00: SUCCESS

github-actions[bot] avatar Dec 12 '25 17:12 github-actions[bot]

:x: Gradle check result for e78533c3edc46abba1f6165287885c74312bce8a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Dec 13 '25 04:12 github-actions[bot]