OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

Add getHistoryOperationsFromTranslog method to fetch the history snapshot from translogs

Open ankitkala opened this issue 3 years ago β€’ 12 comments

Description

Add getHistoryOperationsFromTranslog method to fetch the hostory snapshot from translogs

Original issue for reference: https://github.com/opensearch-project/OpenSearch/issues/2482

Issues Resolved

375

Check List

  • [ ] New functionality includes testing.
    • [ ] All tests pass
  • [ ] New functionality has been documented.
    • [ ] New functionality has javadoc added
  • [ ] Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

ankitkala avatar Jul 19 '22 08:07 ankitkala

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/785/
  • CommitID: c10b56174119769809d8df5934472207800e28af

github-actions[bot] avatar Jul 19 '22 08:07 github-actions[bot]

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/788/
  • CommitID: 092e54515fac5ffd62a6b6450d421cd3663a1473

github-actions[bot] avatar Jul 19 '22 10:07 github-actions[bot]

@Bukhtawar Another thing to call out is that I haven't added an override of the method newChangesSnapshot for WriteOnlyTranslogManager and will inherit that from InternalTranslogManager. Ideally WriteOnlyTranslogManager shouldn't be used for reading the operations but wasn't completely sure of the context where this'd be used. Let me know if we should disallow the snapshot read for WriteOnlyTranslogManager as well.

ankitkala avatar Jul 19 '22 11:07 ankitkala

Gradle Check (Jenkins) Run Completed with:

  • RESULT: SUCCESS :white_check_mark:
  • URL: https://build.ci.opensearch.org/job/gradle-check/789/
  • CommitID: 80d9d5aa3444541c508913c54d8d2b7448514b23

github-actions[bot] avatar Jul 19 '22 11:07 github-actions[bot]

Codecov Report

Merging #3948 (56ac46f) into main (3ef0046) will increase coverage by 0.04%. The diff coverage is 61.53%.

@@             Coverage Diff              @@
##               main    opensearch-project/OpenSearch#3948      +/-   ##
============================================
+ Coverage     70.64%   70.68%   +0.04%     
- Complexity    57294    57305      +11     
============================================
  Files          4617     4617              
  Lines        275479   275490      +11     
  Branches      40328    40328              
============================================
+ Hits         194602   194734     +132     
+ Misses        64581    64437     -144     
- Partials      16296    16319      +23     
Impacted Files Coverage Ξ”
...in/java/org/opensearch/index/shard/IndexShard.java 69.43% <0.00%> (-0.29%) :arrow_down:
...search/index/translog/InternalTranslogManager.java 65.94% <0.00%> (-5.60%) :arrow_down:
...opensearch/index/translog/NoOpTranslogManager.java 66.66% <0.00%> (-2.30%) :arrow_down:
...earch/index/translog/WriteOnlyTranslogManager.java 66.66% <0.00%> (-13.34%) :arrow_down:
.../main/java/org/opensearch/gradle/Architecture.java 100.00% <100.00%> (+37.50%) :arrow_up:
.../opensearch/gradle/DistributionDownloadPlugin.java 87.50% <100.00%> (+0.22%) :arrow_up:
...ldSrc/src/main/java/org/opensearch/gradle/Jdk.java 61.84% <100.00%> (ΓΈ)
server/src/main/java/org/opensearch/Version.java 79.54% <100.00%> (+0.09%) :arrow_up:
...ava/org/opensearch/bootstrap/SystemCallFilter.java 35.24% <100.00%> (+0.28%) :arrow_up:
...ava/org/opensearch/action/NoSuchNodeException.java 0.00% <0.00%> (-50.00%) :arrow_down:
... and 469 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

codecov-commenter avatar Jul 19 '22 11:07 codecov-commenter

We should override this for WriteOnlyTranslogManager to return empty snapshot

It might be helpful for ccr if WriteOnlyTranslogManager(replica shards) returns the snapshot as well. This'll allow us to serve operations from leader's replica shards as well so we can load balance the requests coming from follower shard.

ankitkala avatar Jul 20 '22 17:07 ankitkala

@Bukhtawar Do you see any issues with WriteOnlyTranslogManager supporting the snapshots? Technically it is feasible right? I understand that the TranslogManager for NRT replicas is configured as write-only but just wanted to understand a bit more about reasoning for this design choice? If it was discussed anywhere, sharing that link would also help.

ankitkala avatar Jul 21 '22 05:07 ankitkala

NRTReplicationEngine will unfortunately not be indexing into Lucene so LuceneChangesSnapshot will be empty

https://github.com/opensearch-project/OpenSearch/blob/5444aac1684d26bfabbc17901b0a8584c700fd87/server/src/main/java/org/opensearch/index/engine/NRTReplicationEngine.java#L151-L161

Bukhtawar avatar Jul 21 '22 05:07 Bukhtawar

cc: @nknize @mch2

ankitkala avatar Jul 21 '22 17:07 ankitkala

Note that the current implementation although decouples translog from the engine, isn't a Pluggable modules yet. This change would just mean exposing a public method not necessarily pluggable since the moving translog to module work hasn't been completed(TranslogManager implementations should ideally be sitting in a module)

Also this change forces the default implementations to implement this even for Engines that don't have to handle CCR. How do we ensure it is extensible enough so that it can be extended by plugin implementors as needed.

Bukhtawar avatar Jul 25 '22 08:07 Bukhtawar

Also this change forces the default implementations to implement this even for Engines that don't have to handle CCR. How do we ensure it is extensible enough so that it can be extended by plugin implementors as needed.

Just to clarify, we're talking about engine for NRT replicas right?

Also, do we plan to keep separate engines for different use-cases? My understanding was that we eventually want to have a single Engine implementation with support for extension.

ankitkala avatar Jul 25 '22 17:07 ankitkala

I gave this a quick look. I'm not strongly opposed including this so we can cleanly support the current CCR implementation so long as we have deprecation markings to signal that we're not sticking w/ this approach for too terribly long as we move to segrep, remote store, and streaming index API. I included a couple questions and would like @mch2 to have a look.

@nknize Yes, we want to move to Segment Replication as a default choice for CCR but we might not want to deprecate the logical replication just yet(more details here). Mostly because its a one-way door, we can always deprecate the logical replication later, if required.

Let's assume that we want to keep this as a long term solution, what would you recommend?

I think we'll still want to rely on TranslogManger for fetching the operations. Even with translogs on remote storage, we can continue doing so. Only caveat here might be that we can fetch the operations only from leader's primary shard and not replicas (which should be fine for us) (more here).

ankitkala avatar Jul 27 '22 13:07 ankitkala

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/1657/
  • CommitID: f3a18b855cf061e838413e37da5362992dc4ad70

github-actions[bot] avatar Aug 10 '22 11:08 github-actions[bot]

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/2882/
  • CommitID: 6984436051f1cbd292ab0032d832f91e58d88925

github-actions[bot] avatar Sep 08 '22 07:09 github-actions[bot]

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/2883/
  • CommitID: e36d3adafaf59d05011230ef37e9b3254a0e4a18

github-actions[bot] avatar Sep 08 '22 11:09 github-actions[bot]

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/2884/
  • CommitID: 8fedee6fae2cd915b24772868445d52b28a821c3

github-actions[bot] avatar Sep 08 '22 11:09 github-actions[bot]

Gradle check failing due to the error below. Will rebase after #4455 is merged

Execution failed for task ':distribution:bwc:minor:buildBwcLinuxTar'.
> Building 2.3.0 didn't generate expected file /Volumes/ws/OpenSearch/distribution/bwc/minor/build/bwc/checkout-2.x/distribution/archives/linux-tar/build/distributions/opensearch-min-2.3.0-SNAPSHOT-linux-x64.tar.gz

ankitkala avatar Sep 08 '22 12:09 ankitkala

Gradle Check (Jenkins) Run Completed with:

  • RESULT: SUCCESS :white_check_mark:
  • URL: https://build.ci.opensearch.org/job/gradle-check/2905/
  • CommitID: eb33e0865eedf5ddc9ad4f2f5235a9452160b9f4

github-actions[bot] avatar Sep 09 '22 04:09 github-actions[bot]

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/2906/
  • CommitID: 2bac7c623f8ecca4a93036b5c5de378be25c3883

github-actions[bot] avatar Sep 09 '22 06:09 github-actions[bot]

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/2907/
  • CommitID: 0e949433731609c6b1eff7f7c91f5d6b17141189

github-actions[bot] avatar Sep 09 '22 06:09 github-actions[bot]

Gradle Check (Jenkins) Run Completed with:

  • RESULT: SUCCESS :white_check_mark:
  • URL: https://build.ci.opensearch.org/job/gradle-check/2908/
  • CommitID: efbd59a718f8cfd289ead6c355fa95aebe610d1c

github-actions[bot] avatar Sep 09 '22 07:09 github-actions[bot]

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/2954/
  • CommitID: 589e9e974900af29971fcac385079bff5858d6a7

github-actions[bot] avatar Sep 12 '22 04:09 github-actions[bot]

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/2955/
  • CommitID: 9abcaccd3f941e89dee10bf98b78a8de1e42189d

github-actions[bot] avatar Sep 12 '22 05:09 github-actions[bot]

Gradle check has been failing for last 2 days on all the PRs. its failing as the Jenkins workflow fails to install JDK 14. Not related to any OpenSearch changes.

java.io.IOException: Unable to locate release: jdk-14.0.2+12
	at io.jenkins.plugins.adoptopenjdk.AdoptOpenJDKInstaller.performInstallation(AdoptOpenJDKInstaller.java:109)
	at hudson.tools.InstallerTranslator.getToolHome(InstallerTranslator.java:70)
	at hudson.tools.ToolLocationNodeProperty.getToolHome(ToolLocationNodeProperty.java:108)
	at hudson.tools.ToolInstallation.translateFor(ToolInstallation.java:221)
	at hudson.model.JDK.forNode(JDK.java:149)
	at hudson.model.JDK.forNode(JDK.java:59)
	at org.jenkinsci.plugins.workflow.steps.ToolStep$Execution.run(ToolStep.java:157)
	at org.jenkinsci.plugins.workflow.steps.ToolStep$Execution.run(ToolStep.java:138)
	at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution.lambda$start$0(SynchronousNonBlockingStepExecution.java:47)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
[Checks API] No suitable checks publisher found.

ankitkala avatar Sep 12 '22 06:09 ankitkala

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/2998/
  • CommitID: 5b841c765c2165034d8cbcd68dde48f023726470

github-actions[bot] avatar Sep 13 '22 03:09 github-actions[bot]

Gradle Check (Jenkins) Run Completed with:

  • RESULT: SUCCESS :white_check_mark:
  • URL: https://build.ci.opensearch.org/job/gradle-check/2999/
  • CommitID: 56ac46fc72564ebb1c45e3ffa4814dbbb33fdcd9

github-actions[bot] avatar Sep 13 '22 03:09 github-actions[bot]