OpenSearch
OpenSearch copied to clipboard
Support shard promotion with Segment Replication.
Signed-off-by: Marc Handalian [email protected]
Description
This change adds basic failover support with segment replication. Once selected, a replica will commit its SegmentInfos and reopen a writeable engine. The replica will also remove all other commits so that this commit is selected when the writeable engine is opened. It is possible that this commit is not considered 'safe' by the primary, meaning its max seqNo is higher than the global cp. While an edge case, we never want replicas to reindex with segment replication enabled, so if the global cp has not been updated yet we do not want to revert to a safe commit. This change also updates how SegmentReplicationCheckpointPublisher is wired up within IndexShard so that once promoted the new primary can publish checkpoints.
This PR does not handle edge cases of promotion while there are ongoing replication events, that will be covered in a separate issue.
Issues Resolved
closes #3989
Check List
- [x] New functionality includes testing.
- [x] All tests pass
- [ ] New functionality has been documented.
- [ ] New functionality has javadoc added
- [x] Commits are signed per the DCO using --signoff
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Gradle Check (Jenkins) Run Completed with:
- RESULT: FAILURE :x:
- URL: https://build.ci.opensearch.org/job/gradle-check/1464/
- CommitID: 58a62608c8a484d274599197e1bd0471a735c60a
Gradle Check (Jenkins) Run Completed with:
- RESULT: SUCCESS :white_check_mark:
- URL: https://build.ci.opensearch.org/job/gradle-check/1468/
- CommitID: c28905a10ab8377a8277793429d85069ca94c2b1
Codecov Report
Merging #4135 (28fea5f) into main (237f1a5) will decrease coverage by
0.00%. The diff coverage is76.92%.
@@ Coverage Diff @@
## main #4135 +/- ##
============================================
- Coverage 70.65% 70.64% -0.01%
- Complexity 57075 57145 +70
============================================
Files 4606 4606
Lines 274706 274737 +31
Branches 40228 40231 +3
============================================
- Hits 194103 194099 -4
- Misses 64280 64374 +94
+ Partials 16323 16264 -59
| Impacted Files | Coverage Δ | |
|---|---|---|
| ...c/main/java/org/opensearch/index/IndexService.java | 73.86% <0.00%> (-0.23%) |
:arrow_down: |
| ...nsearch/index/shard/CheckpointRefreshListener.java | 88.88% <0.00%> (-11.12%) |
:arrow_down: |
| ...in/java/org/opensearch/index/shard/IndexShard.java | 69.07% <73.33%> (-0.58%) |
:arrow_down: |
| ...rc/main/java/org/opensearch/index/store/Store.java | 81.30% <81.25%> (-0.60%) |
:arrow_down: |
| .../opensearch/index/engine/NRTReplicationEngine.java | 76.92% <100.00%> (+1.52%) |
:arrow_up: |
| ...ation/OpenSearchIndexLevelReplicationTestCase.java | 89.81% <100.00%> (+0.02%) |
:arrow_up: |
| ...java/org/opensearch/client/indices/DataStream.java | 0.00% <0.00%> (-76.09%) |
:arrow_down: |
| ...n/indices/forcemerge/ForceMergeRequestBuilder.java | 0.00% <0.00%> (-75.00%) |
:arrow_down: |
| ...adonly/AddIndexBlockClusterStateUpdateRequest.java | 0.00% <0.00%> (-75.00%) |
:arrow_down: |
| .../opensearch/client/indices/CloseIndexResponse.java | 17.50% <0.00%> (-60.00%) |
:arrow_down: |
| ... and 499 more |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
I've added a commit to this ensuring cancelling primary allocation succeeds and that the replica is promoted & primary recreated as a replica. In testing that I found we were failing to publish a replication checkpoint if the primary flushed during close. That is now fixed, the shard must be open for us to publish the replication cp.
Gradle Check (Jenkins) Run Completed with:
- RESULT: UNSTABLE :x:
- URL: https://build.ci.opensearch.org/job/gradle-check/1597/
- CommitID: 8eedb4a7f1c2f9cf86f2d475a2266715aeaf2470
Gradle Check (Jenkins) Run Completed with:
- RESULT: SUCCESS :white_check_mark:
- URL: https://build.ci.opensearch.org/job/gradle-check/1612/
- CommitID: 8eedb4a7f1c2f9cf86f2d475a2266715aeaf2470
Gradle Check (Jenkins) Run Completed with:
- RESULT: SUCCESS :white_check_mark:
- URL: https://build.ci.opensearch.org/job/gradle-check/1755/
- CommitID: de626ff10fc1ad4780b5bae593cf45b55b3112f2
Gradle Check (Jenkins) Run Completed with:
- RESULT: FAILURE :x:
- URL: https://build.ci.opensearch.org/job/gradle-check/1811/
- CommitID: 49c483851d8c785bb000924f44ebe99d95187738
Gradle Check (Jenkins) Run Completed with:
- RESULT: FAILURE :x:
- URL: https://build.ci.opensearch.org/job/gradle-check/1812/
- CommitID: d7b32410bd585d0399bd8a0a0bd4e8820f9ffc50
Gradle Check (Jenkins) Run Completed with:
- RESULT: SUCCESS :white_check_mark:
- URL: https://build.ci.opensearch.org/job/gradle-check/1813/
- CommitID: d7b32410bd585d0399bd8a0a0bd4e8820f9ffc50
Gradle Check (Jenkins) Run Completed with:
- RESULT: UNSTABLE :x:
- URL: https://build.ci.opensearch.org/job/gradle-check/1816/
- CommitID: 28fea5fbadaea0d8c026d0bec9edf9b181cc3a61
Gradle Check (Jenkins) Run Completed with:
- RESULT: SUCCESS :white_check_mark:
- URL: https://build.ci.opensearch.org/job/gradle-check/1818/
- CommitID: 28fea5fbadaea0d8c026d0bec9edf9b181cc3a61
The backport to 2.x failed:
The process '/usr/bin/git' failed with exit code 1
To backport manually, run these commands in your terminal:
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-4135-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 f65e02d1b910bd0a1990868bfa5d12ba829bbbd5
# Push it to GitHub
git push --set-upstream origin backport/backport-4135-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x
Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-4135-to-2.x.