OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

No-op replication for primary term validation with NRTSegRep

Open ashking94 opened this issue 2 years ago • 12 comments

Signed-off-by: Ashish Singh [email protected]

Description

As part of implementing #3706, this is the initial commit that does the following -

  • Introduces an abstraction for developing No Op replication (for primary term validation) on top of NRT segment replication.
  • Implements Engine (NRTReplicationNoOpEngine) for No-op replication use case where the calls to replica does not persist any operation onto the replicas. There is in-memory storage, however, of the last seq no seen. This is to handle recovery. The translog manager being used is NoOpTranslogManager that does not perform any operation.
  • Follow things are working -
    • Primary term validation during the indexing/delete/update/bulk calls.
    • Peer recovery of replicas are working fine. Currently, the replica is brought to speed upto the last successful commit on Primary.

Following items have to be handled, should be followed with PRs -

  • Request payload is not required to be sent across the wire now in the in-sync replication call. We have to fix this.
  • internal:index/shard/recovery/translog_ops action which brings the translog from primary and performs indexing on replica is not required for replica where segrep and remote store are enabled. The reason is that translog on remote store would be the source of truth where only the assuming primary would be publishing the translogs.
  • Also, the translog replay can probably be replaced with Segment replication so that all operations until the last refresh on primary is made available to the replica.
  • During the recovery or during Engine bootstrap, the translog generation and checkpoint files are getting created on local disk. This needs to be fixed so that there is no reliance on local disk.
  • Recovery is a multistep workflow (or step function) of which one of the steps/flow is where translog replay is carried out. Currently, replay translog is a no-op on the replica. For the recovery to complete successfully (and code to work), have overridden the getPersistedLocalCheckpoint method to return the seqNo of the last translog which was indexed. Essentially this is dummy code which we should remove by skipping Replay translog step and directly going with the finalize step.

Issues Resolved

[List any issues this PR will resolve]

Check List

  • [ ] New functionality includes testing.
    • [ ] All tests pass
  • [ ] New functionality has been documented.
    • [ ] New functionality has javadoc added
  • [ ] Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

ashking94 avatar Aug 04 '22 14:08 ashking94

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/1442/
  • CommitID: 2356e2dc4a50e4248eecbe3caff99e02cc5cc4d9

github-actions[bot] avatar Aug 04 '22 14:08 github-actions[bot]

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/1443/
  • CommitID: 12edd37abbf834bf49a94fdfbab114a028a62e9f

github-actions[bot] avatar Aug 04 '22 15:08 github-actions[bot]

Gradle Check (Jenkins) Run Completed with:

  • RESULT: SUCCESS :white_check_mark:
  • URL: https://build.ci.opensearch.org/job/gradle-check/1445/
  • CommitID: 12edd37abbf834bf49a94fdfbab114a028a62e9f

github-actions[bot] avatar Aug 04 '22 15:08 github-actions[bot]

Gradle Check (Jenkins) Run Completed with:

  • RESULT: SUCCESS :white_check_mark:
  • URL: https://build.ci.opensearch.org/job/gradle-check/1458/
  • CommitID: 396f13b6cb755294a728ac94ffd1afeee69bc7dd

github-actions[bot] avatar Aug 04 '22 20:08 github-actions[bot]

Do you think we could add tests to exercise the new code path? Will review in details once we add tests and resolve conflicts. At this point the concern I have is this change is already hitting the Engine, can we not return pre-emptively shortly after hitting the replica once we have validated the primary term invariant? Do you think the below change in TransportShardBulkAction might work

@Override
    protected void dispatchedShardOperationOnReplica(BulkShardRequest request, IndexShard replica, ActionListener<ReplicaResult> listener) {
        ActionListener.completeWith(listener, () -> {
            Translog.Location location = new Translog.Location(0,0,0);
            if (replica.indexSettings().isRemoteStoreEnabled() && replica.indexSettings().isSegRepEnabled()) {
                replica.ensureWriteAllowed(Engine.Operation.Origin.REPLICA);
            } else {
                location = performOnReplica(request, replica);
            }
            return new WriteReplicaResult<>(request, location, null, replica, logger);
        });
    }

This is something that I have explored. Currently when a shard recovery happens, one of the step involves replaying translog. And for recovery to complete, at the end of replay translog operation, it should return the expected value which is the highest sequence number seen during the replay translog step. When we refactor the recovery code and skip translog replay and directly jump to finalize step, we can probably totally avoid the performOnReplica method. And this is the plan as well (have mentioned in the PR description). We also need to see later what changes would be required for primary-primary recovery, and hence we can make this change in Recovery finally then.

ashking94 avatar Aug 05 '22 14:08 ashking94

UTs and ITs would follow soon.

ashking94 avatar Aug 05 '22 14:08 ashking94

cc @mch2 @dreamer-89 @sachinpkale

ashking94 avatar Aug 05 '22 14:08 ashking94

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/1485/
  • CommitID: 7344aaa0ea325fcdf0c0fad70a0b5418d80736bd

github-actions[bot] avatar Aug 05 '22 15:08 github-actions[bot]

Gradle Check (Jenkins) Run Completed with:

  • RESULT: SUCCESS :white_check_mark:
  • URL: https://build.ci.opensearch.org/job/gradle-check/1486/
  • CommitID: b420b3de1255daf292866206740ba7d5d3cd5f64

github-actions[bot] avatar Aug 05 '22 16:08 github-actions[bot]

Gradle Check (Jenkins) Run Completed with:

  • RESULT: SUCCESS :white_check_mark:
  • URL: https://build.ci.opensearch.org/job/gradle-check/1554/
  • CommitID: 830a9380212cff35a8aeecc00707a9754f7a3e63

github-actions[bot] avatar Aug 08 '22 07:08 github-actions[bot]

Gradle Check (Jenkins) Run Completed with:

  • RESULT: SUCCESS :white_check_mark:
  • URL: https://build.ci.opensearch.org/job/gradle-check/1555/
  • CommitID: ebc57ca71a57ecac72845259b1f50dc2ef61f1a0

github-actions[bot] avatar Aug 08 '22 08:08 github-actions[bot]

Codecov Report

Merging #4127 (5f93b80) into main (5f2e66b) will decrease coverage by 0.13%. The diff coverage is 75.42%.

@@             Coverage Diff              @@
##               main    #4127      +/-   ##
============================================
- Coverage     70.78%   70.65%   -0.14%     
+ Complexity    57218    57104     -114     
============================================
  Files          4605     4607       +2     
  Lines        274695   274730      +35     
  Branches      40228    40228              
============================================
- Hits         194441   194098     -343     
- Misses        63955    64381     +426     
+ Partials      16299    16251      -48     
Impacted Files Coverage Δ
...main/java/org/opensearch/common/lucene/Lucene.java 66.02% <ø> (-1.28%) :arrow_down:
...index/codec/PerFieldMappingPostingFormatCodec.java 64.28% <ø> (ø)
...arch/index/engine/NRTReplicationReaderManager.java 86.95% <ø> (ø)
...s/replication/SegmentReplicationSourceHandler.java 87.71% <ø> (-0.81%) :arrow_down:
...va/org/opensearch/index/engine/EngineTestCase.java 86.46% <66.66%> (+0.52%) :arrow_up:
...arch/index/engine/NRTReplicationEngineFactory.java 77.77% <71.42%> (-22.23%) :arrow_down:
...nsearch/index/engine/NRTReplicationNoOpEngine.java 71.42% <71.42%> (ø)
...rch/index/engine/AbstractNRTReplicationEngine.java 73.50% <73.50%> (ø)
.../opensearch/index/engine/NRTReplicationEngine.java 80.95% <75.00%> (+5.75%) :arrow_up:
...in/java/org/opensearch/index/shard/IndexShard.java 68.92% <80.00%> (-0.40%) :arrow_down:
... and 516 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

codecov-commenter avatar Aug 08 '22 08:08 codecov-commenter

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/1763/
  • CommitID: e9f6d56670e50e2cd8ccf85fe979c6a102e939d3

github-actions[bot] avatar Aug 16 '22 08:08 github-actions[bot]

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/1764/
  • CommitID: 65fe29cb38d682b5b2e90cc5259ede894ae5a422

github-actions[bot] avatar Aug 16 '22 09:08 github-actions[bot]

Gradle Check (Jenkins) Run Completed with:

  • RESULT: SUCCESS :white_check_mark:
  • URL: https://build.ci.opensearch.org/job/gradle-check/1765/
  • CommitID: 5f93b80484bbc7871ea7d7ffea82bd5697a40f57

github-actions[bot] avatar Aug 16 '22 09:08 github-actions[bot]

Do you think we could add tests to exercise the new code path? Will review in details once we add tests and resolve conflicts. At this point the concern I have is this change is already hitting the Engine, can we not return pre-emptively shortly after hitting the replica once we have validated the primary term invariant? Do you think the below change in TransportShardBulkAction might work

@Override
    protected void dispatchedShardOperationOnReplica(BulkShardRequest request, IndexShard replica, ActionListener<ReplicaResult> listener) {
        ActionListener.completeWith(listener, () -> {
            Translog.Location location = new Translog.Location(0,0,0);
            if (replica.indexSettings().isRemoteStoreEnabled() && replica.indexSettings().isSegRepEnabled()) {
                replica.ensureWriteAllowed(Engine.Operation.Origin.REPLICA);
            } else {
                location = performOnReplica(request, replica);
            }
            return new WriteReplicaResult<>(request, location, null, replica, logger);
        });
    }

This is something that I have explored. Currently when a shard recovery happens, one of the step involves replaying translog. And for recovery to complete, at the end of replay translog operation, it should return the expected value which is the highest sequence number seen during the replay translog step. When we refactor the recovery code and skip translog replay and directly jump to finalize step, we can probably totally avoid the performOnReplica method. And this is the plan as well (have mentioned in the PR description). We also need to see later what changes would be required for primary-primary recovery, and hence we can make this change in Recovery finally then.

So if I understand this correct we can totally avoid performOnReplica method then we don't really need a NRTReplicationNoOpEngine as call to any engine will be short-circuited and new engine changes would be effectively dead-code. I would prefer avoiding any engine changes and rather use assertions in the existing engine to ensure there are no calls made to the engine if the mode if replica and remote translogs are enabled. "Best code is no code" :)

The way I would approach this is starting from refactoring the recovery code to see if the eventual state can get rid of performOnReplica and work backwards. If that's something that is not possible I would consider using a gating mechanism like a feature flag or even a feature branch to avoid breaking existing feature sets and develop NoOp replication in isolation till we can integrate the eventual solution incrementally into mainline rather than building abstraction that would eventually be dead code

Bukhtawar avatar Aug 16 '22 17:08 Bukhtawar

Revisiting the recovery code to allow for the sync replication call to return before reaching engine. cc @Bukhtawar @mch2

ashking94 avatar Aug 17 '22 10:08 ashking94

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/1879/
  • CommitID: 3e325bec37aa3e1cc6a4053fa11af8ece472e24b

github-actions[bot] avatar Aug 18 '22 09:08 github-actions[bot]

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/1943/
  • CommitID: 5f93b80484bbc7871ea7d7ffea82bd5697a40f57

github-actions[bot] avatar Aug 19 '22 18:08 github-actions[bot]