OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

[Remote Store] Add RemoteSegmentStoreDirectory to interact with remote segment store

Open sachinpkale opened this issue 2 years ago • 16 comments

Signed-off-by: Sachin Kale [email protected]

Description

  • To avoid concurrency issues where two primary exists for a given shard at the same time and upload segment files which can overwrite each other, we decided to add a unique suffix (UUID) to each segment filename that is uploaded to remote segment store.
  • But when we restore these segment files, we need restore them as per their original name. Also, need a mechanism to understand which segment file is a part of particular commit checkpoint. For this, we also upload a metadata file per checkpoint (refresh/commit). This metadata file contains map of original segment name to uploaded segment name.
  • The above logic is implemented in RemoteSegmentStoreDirectory which composes two instances of remote directories (one for segment and another for metadata) and still provides a directory interface for a caller. This way, caller would invoke directory methods of RemoteSegmentStoreDirectory in the same way as FSDirectory.
  • Two instances of RemoteDirectory that are part of RemoteSegmentStoreDirectory:
    • remoteDataDirectory: <Cluster UUID>/<Index UUID>/<Shard ID>/segments/data
    • remoteMetadataDirectory: <Cluster UUID>/<Index UUID>/<Shard ID>/segments/metadata
  • Sample Files under each directory path:
    • remoteMetadataDirectory
      • refresh_mapping__1__z__lKDiNIIBrs0AUNsRcOa3
      • commit_mapping__1__z__lKDiNIIBrs0AUNsRcOa3
      • refresh_mapping__1__y__g6DeNIIBrs0AUNsRQubk
      • commit_mapping__1__y__g6DeNIIBrs0AUNsRQubk
    • remoteDataDirectory
      • _10v.cfe__yXvPNIIBrs0AUNsRUXfQ
      • _10v.cfs__uHvPNIIBrs0AUNsRUVA2
      • _10v.si__h3vPNIIBrs0AUNsRSgJ-
      • _10w.cfe__UnrPNIIBrs0AUNsRR43c
      • _10w.cfs__ZXrPNIIBrs0AUNsRSbQU
      • _10w.si__KnrPNIIBrs0AUNsRRD-K

Issues Resolved

  • https://github.com/opensearch-project/OpenSearch/issues/3906

Check List

  • [X] New functionality includes testing.
    • [X] All tests pass
  • [X] New functionality has been documented.
    • [X] New functionality has javadoc added
  • [X] Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

sachinpkale avatar Jul 27 '22 04:07 sachinpkale

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/1025/
  • CommitID: 26025178955f231df71b8be27ae7a3a7362d20ea

github-actions[bot] avatar Jul 27 '22 04:07 github-actions[bot]

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/1026/
  • CommitID: 505acf9e05e552aac6948a62a061ad22fae7e386

github-actions[bot] avatar Jul 27 '22 04:07 github-actions[bot]

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/1034/
  • CommitID: fc25eaf72d6d06edcb47b70d9e2fcf380dc78bf5

github-actions[bot] avatar Jul 27 '22 06:07 github-actions[bot]

@andrross @Bukhtawar Please review.

sachinpkale avatar Jul 28 '22 08:07 sachinpkale

Gradle Check (Jenkins) Run Completed with:

  • RESULT: SUCCESS :white_check_mark:
  • URL: https://build.ci.opensearch.org/job/gradle-check/1064/
  • CommitID: 6e053b47bfb66268f6cff5d562398a71d1d6102c

github-actions[bot] avatar Jul 28 '22 09:07 github-actions[bot]

Codecov Report

Merging #4020 (2470f4a) into main (a469a3c) will increase coverage by 0.04%. The diff coverage is 86.59%.

@@             Coverage Diff              @@
##               main    #4020      +/-   ##
============================================
+ Coverage     70.59%   70.64%   +0.04%     
- Complexity    57083    57118      +35     
============================================
  Files          4603     4605       +2     
  Lines        274551   274670     +119     
  Branches      40210    40223      +13     
============================================
+ Hits         193831   194037     +206     
+ Misses        64514    64378     -136     
- Partials      16206    16255      +49     
Impacted Files Coverage Δ
...ava/org/opensearch/client/RestHighLevelClient.java 44.32% <ø> (-0.16%) :arrow_down:
...gregations/metrics/GeoBoundsAggregatorFactory.java 88.88% <ø> (ø)
...search/aggregations/metrics/InternalGeoBounds.java 66.66% <ø> (ø)
...o/search/aggregations/metrics/ParsedGeoBounds.java 88.00% <ø> (ø)
.../main/java/org/opensearch/search/SearchModule.java 96.27% <ø> (-0.03%) :arrow_down:
...earch/search/aggregations/AggregationBuilders.java 46.15% <ø> (+1.15%) :arrow_up:
...regations/support/AggregationInspectionHelper.java 53.65% <ø> (+1.27%) :arrow_up:
...g/opensearch/test/InternalAggregationTestCase.java 98.21% <ø> (-0.46%) :arrow_down:
...a/org/opensearch/test/OpenSearchIntegTestCase.java 57.37% <ø> (-0.11%) :arrow_down:
...regations/metrics/AbstractGeoBoundsAggregator.java 56.52% <56.52%> (ø)
... and 477 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

codecov-commenter avatar Jul 28 '22 09:07 codecov-commenter

/cc : @ashking94 for help with review

Bukhtawar avatar Jul 29 '22 06:07 Bukhtawar

Update: I got to know that SegmentInfosSnapshot does not just contain incremental segment files since the last commit but contain list of all the live segment files for the given shard. This would simplify current approach where instead of keeping track of 2 separate metadata files (commit and refresh), we keep track of only one metadata file. This is just a thought as of now. I will make the changes after outlining all the details.

sachinpkale avatar Jul 29 '22 16:07 sachinpkale

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/1153/
  • CommitID: 813dc00faf2c4aebc109a39fd95b3f29cec946ee

github-actions[bot] avatar Aug 01 '22 04:08 github-actions[bot]

Build failed with:

java.lang.AssertionError: Failure at [repository_s3/20_repository_permanent_credentials:152]: expected [2xx] status code but api [snapshot.create] returned [400 Bad Request] [{"error":{"root_cause":[{"type":"invalid_snapshot_name_exception","reason":"[repository_permanent:snapshot-one] Invalid snapshot name [snapshot-one], snapshot with the same name already exists","stack_trace":"InvalidSnapshotNameException[[repository_permanent:snapshot-one] Invalid snapshot name [snapshot-one], snapshot with the same name already exists]

This test is not related with the changes in this PR. Re-triggering the build.

sachinpkale avatar Aug 01 '22 05:08 sachinpkale

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/1164/
  • CommitID: 522905e8036ae7e0b721a931a3883080eadbd115

github-actions[bot] avatar Aug 01 '22 06:08 github-actions[bot]

Created https://github.com/opensearch-project/OpenSearch/issues/4069 to track this.

dreamer-89 avatar Aug 01 '22 17:08 dreamer-89

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/1235/
  • CommitID: 2efcc1067cbca71d68ae2e92419786fa75f49974

github-actions[bot] avatar Aug 02 '22 02:08 github-actions[bot]

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/1239/
  • CommitID: 1a4cbda5ca69ea58fdaa3bfe6e5448e7127b82ec

github-actions[bot] avatar Aug 02 '22 05:08 github-actions[bot]

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/1258/
  • CommitID: 1ef80346f1044a3b29c3ebeaf151ecad8baaa61e

github-actions[bot] avatar Aug 02 '22 11:08 github-actions[bot]

Gradle Check (Jenkins) Run Completed with:

  • RESULT: SUCCESS :white_check_mark:
  • URL: https://build.ci.opensearch.org/job/gradle-check/1263/
  • CommitID: f9c943065ec02409992d885b7d2af51febda1d9c

github-actions[bot] avatar Aug 02 '22 13:08 github-actions[bot]

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/1684/
  • CommitID: ce356bf67476ba09b8ce0b7693d1ecd23f08b769

github-actions[bot] avatar Aug 12 '22 05:08 github-actions[bot]

Build is failing with:

Execution failed for task ':distribution:bwc:minor:buildBwcLinuxTar'.

Not related to the current change, re-triggering the build.

sachinpkale avatar Aug 12 '22 05:08 sachinpkale

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/1685/
  • CommitID: dafb962f0564dcb29205039d867a82190c3de463

github-actions[bot] avatar Aug 12 '22 06:08 github-actions[bot]

Failing tests:

REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.testRestoreSnapshotAllocationDoesNotExceedWatermark" -Dtests.seed=DFD6B4D068693FA8 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=en-PH -Dtests.timezone=ROK -Druntime.java=17

Seems like a flaky test. Re-triggering build

sachinpkale avatar Aug 12 '22 06:08 sachinpkale

Gradle Check (Jenkins) Run Completed with:

  • RESULT: SUCCESS :white_check_mark:
  • URL: https://build.ci.opensearch.org/job/gradle-check/1689/
  • CommitID: 2470f4a310dfe06fbb5fef67fd33785d20a25ae6

github-actions[bot] avatar Aug 12 '22 07:08 github-actions[bot]