OpenSearch
OpenSearch copied to clipboard
No-op replication for primary term validation with NRTSegRep
Signed-off-by: Ashish Singh [email protected]
Description
As part of implementing #3706, this is the initial commit that does the following -
- Introduces an abstraction for developing No Op replication (for primary term validation) on top of NRT segment replication.
- Implements Engine (
NRTReplicationNoOpEngine
) for No-op replication use case where the calls to replica does not persist any operation onto the replicas. There is in-memory storage, however, of the last seq no seen. This is to handle recovery. The translog manager being used isNoOpTranslogManager
that does not perform any operation. - Follow things are working -
- Primary term validation during the indexing/delete/update/bulk calls.
- Peer recovery of replicas are working fine. Currently, the replica is brought to speed upto the last successful commit on Primary.
Following items have to be handled, should be followed with PRs -
- Request payload is not required to be sent across the wire now in the in-sync replication call. We have to fix this.
-
internal:index/shard/recovery/translog_ops
action which brings the translog from primary and performs indexing on replica is not required for replica where segrep and remote store are enabled. The reason is that translog on remote store would be the source of truth where only the assuming primary would be publishing the translogs. - Also, the translog replay can probably be replaced with Segment replication so that all operations until the last refresh on primary is made available to the replica.
- During the recovery or during Engine bootstrap, the translog generation and checkpoint files are getting created on local disk. This needs to be fixed so that there is no reliance on local disk.
- Recovery is a multistep workflow (or step function) of which one of the steps/flow is where translog replay is carried out. Currently, replay translog is a no-op on the replica. For the recovery to complete successfully (and code to work), have overridden the
getPersistedLocalCheckpoint
method to return the seqNo of the last translog which was indexed. Essentially this is dummy code which we should remove by skipping Replay translog step and directly going with the finalize step.
Issues Resolved
[List any issues this PR will resolve]
Check List
- [ ] New functionality includes testing.
- [ ] All tests pass
- [ ] New functionality has been documented.
- [ ] New functionality has javadoc added
- [ ] Commits are signed per the DCO using --signoff
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Gradle Check (Jenkins) Run Completed with:
- RESULT: FAILURE :x:
- URL: https://build.ci.opensearch.org/job/gradle-check/1442/
- CommitID: 2356e2dc4a50e4248eecbe3caff99e02cc5cc4d9
Gradle Check (Jenkins) Run Completed with:
- RESULT: UNSTABLE :x:
- URL: https://build.ci.opensearch.org/job/gradle-check/1443/
- CommitID: 12edd37abbf834bf49a94fdfbab114a028a62e9f
Gradle Check (Jenkins) Run Completed with:
- RESULT: SUCCESS :white_check_mark:
- URL: https://build.ci.opensearch.org/job/gradle-check/1445/
- CommitID: 12edd37abbf834bf49a94fdfbab114a028a62e9f
Gradle Check (Jenkins) Run Completed with:
- RESULT: SUCCESS :white_check_mark:
- URL: https://build.ci.opensearch.org/job/gradle-check/1458/
- CommitID: 396f13b6cb755294a728ac94ffd1afeee69bc7dd
Do you think we could add tests to exercise the new code path? Will review in details once we add tests and resolve conflicts. At this point the concern I have is this change is already hitting the
Engine
, can we not return pre-emptively shortly after hitting the replica once we have validated the primary term invariant? Do you think the below change inTransportShardBulkAction
might work@Override protected void dispatchedShardOperationOnReplica(BulkShardRequest request, IndexShard replica, ActionListener<ReplicaResult> listener) { ActionListener.completeWith(listener, () -> { Translog.Location location = new Translog.Location(0,0,0); if (replica.indexSettings().isRemoteStoreEnabled() && replica.indexSettings().isSegRepEnabled()) { replica.ensureWriteAllowed(Engine.Operation.Origin.REPLICA); } else { location = performOnReplica(request, replica); } return new WriteReplicaResult<>(request, location, null, replica, logger); }); }
This is something that I have explored. Currently when a shard recovery happens, one of the step involves replaying translog. And for recovery to complete, at the end of replay translog operation, it should return the expected value which is the highest sequence number seen during the replay translog step. When we refactor the recovery code and skip translog replay and directly jump to finalize step, we can probably totally avoid the performOnReplica
method. And this is the plan as well (have mentioned in the PR description). We also need to see later what changes would be required for primary-primary recovery, and hence we can make this change in Recovery finally then.
UTs and ITs would follow soon.
cc @mch2 @dreamer-89 @sachinpkale
Gradle Check (Jenkins) Run Completed with:
- RESULT: FAILURE :x:
- URL: https://build.ci.opensearch.org/job/gradle-check/1485/
- CommitID: 7344aaa0ea325fcdf0c0fad70a0b5418d80736bd
Gradle Check (Jenkins) Run Completed with:
- RESULT: SUCCESS :white_check_mark:
- URL: https://build.ci.opensearch.org/job/gradle-check/1486/
- CommitID: b420b3de1255daf292866206740ba7d5d3cd5f64
Gradle Check (Jenkins) Run Completed with:
- RESULT: SUCCESS :white_check_mark:
- URL: https://build.ci.opensearch.org/job/gradle-check/1554/
- CommitID: 830a9380212cff35a8aeecc00707a9754f7a3e63
Gradle Check (Jenkins) Run Completed with:
- RESULT: SUCCESS :white_check_mark:
- URL: https://build.ci.opensearch.org/job/gradle-check/1555/
- CommitID: ebc57ca71a57ecac72845259b1f50dc2ef61f1a0
Codecov Report
Merging #4127 (5f93b80) into main (5f2e66b) will decrease coverage by
0.13%
. The diff coverage is75.42%
.
@@ Coverage Diff @@
## main #4127 +/- ##
============================================
- Coverage 70.78% 70.65% -0.14%
+ Complexity 57218 57104 -114
============================================
Files 4605 4607 +2
Lines 274695 274730 +35
Branches 40228 40228
============================================
- Hits 194441 194098 -343
- Misses 63955 64381 +426
+ Partials 16299 16251 -48
Impacted Files | Coverage Δ | |
---|---|---|
...main/java/org/opensearch/common/lucene/Lucene.java | 66.02% <ø> (-1.28%) |
:arrow_down: |
...index/codec/PerFieldMappingPostingFormatCodec.java | 64.28% <ø> (ø) |
|
...arch/index/engine/NRTReplicationReaderManager.java | 86.95% <ø> (ø) |
|
...s/replication/SegmentReplicationSourceHandler.java | 87.71% <ø> (-0.81%) |
:arrow_down: |
...va/org/opensearch/index/engine/EngineTestCase.java | 86.46% <66.66%> (+0.52%) |
:arrow_up: |
...arch/index/engine/NRTReplicationEngineFactory.java | 77.77% <71.42%> (-22.23%) |
:arrow_down: |
...nsearch/index/engine/NRTReplicationNoOpEngine.java | 71.42% <71.42%> (ø) |
|
...rch/index/engine/AbstractNRTReplicationEngine.java | 73.50% <73.50%> (ø) |
|
.../opensearch/index/engine/NRTReplicationEngine.java | 80.95% <75.00%> (+5.75%) |
:arrow_up: |
...in/java/org/opensearch/index/shard/IndexShard.java | 68.92% <80.00%> (-0.40%) |
:arrow_down: |
... and 516 more |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
Gradle Check (Jenkins) Run Completed with:
- RESULT: FAILURE :x:
- URL: https://build.ci.opensearch.org/job/gradle-check/1763/
- CommitID: e9f6d56670e50e2cd8ccf85fe979c6a102e939d3
Gradle Check (Jenkins) Run Completed with:
- RESULT: FAILURE :x:
- URL: https://build.ci.opensearch.org/job/gradle-check/1764/
- CommitID: 65fe29cb38d682b5b2e90cc5259ede894ae5a422
Gradle Check (Jenkins) Run Completed with:
- RESULT: SUCCESS :white_check_mark:
- URL: https://build.ci.opensearch.org/job/gradle-check/1765/
- CommitID: 5f93b80484bbc7871ea7d7ffea82bd5697a40f57
Do you think we could add tests to exercise the new code path? Will review in details once we add tests and resolve conflicts. At this point the concern I have is this change is already hitting the
Engine
, can we not return pre-emptively shortly after hitting the replica once we have validated the primary term invariant? Do you think the below change inTransportShardBulkAction
might work@Override protected void dispatchedShardOperationOnReplica(BulkShardRequest request, IndexShard replica, ActionListener<ReplicaResult> listener) { ActionListener.completeWith(listener, () -> { Translog.Location location = new Translog.Location(0,0,0); if (replica.indexSettings().isRemoteStoreEnabled() && replica.indexSettings().isSegRepEnabled()) { replica.ensureWriteAllowed(Engine.Operation.Origin.REPLICA); } else { location = performOnReplica(request, replica); } return new WriteReplicaResult<>(request, location, null, replica, logger); }); }
This is something that I have explored. Currently when a shard recovery happens, one of the step involves replaying translog. And for recovery to complete, at the end of replay translog operation, it should return the expected value which is the highest sequence number seen during the replay translog step. When we refactor the recovery code and skip translog replay and directly jump to finalize step, we can probably totally avoid the
performOnReplica
method. And this is the plan as well (have mentioned in the PR description). We also need to see later what changes would be required for primary-primary recovery, and hence we can make this change in Recovery finally then.
So if I understand this correct we can totally avoid performOnReplica
method then we don't really need a NRTReplicationNoOpEngine
as call to any engine will be short-circuited and new engine changes would be effectively dead-code.
I would prefer avoiding any engine changes and rather use assertions in the existing engine to ensure there are no calls made to the engine if the mode if replica and remote translogs are enabled.
"Best code is no code" :)
The way I would approach this is starting from refactoring the recovery code to see if the eventual state can get rid of performOnReplica and work backwards. If that's something that is not possible I would consider using a gating mechanism like a feature flag or even a feature branch to avoid breaking existing feature sets and develop NoOp replication in isolation till we can integrate the eventual solution incrementally into mainline rather than building abstraction that would eventually be dead code
Revisiting the recovery code to allow for the sync replication call to return before reaching engine. cc @Bukhtawar @mch2
Gradle Check (Jenkins) Run Completed with:
- RESULT: FAILURE :x:
- URL: https://build.ci.opensearch.org/job/gradle-check/1879/
- CommitID: 3e325bec37aa3e1cc6a4053fa11af8ece472e24b
Gradle Check (Jenkins) Run Completed with:
- RESULT: FAILURE :x:
- URL: https://build.ci.opensearch.org/job/gradle-check/1943/
- CommitID: 5f93b80484bbc7871ea7d7ffea82bd5697a40f57