OpenSearch
OpenSearch copied to clipboard
[Remote Store] Add RemoteSegmentStoreDirectory to interact with remote segment store
Signed-off-by: Sachin Kale [email protected]
Description
- To avoid concurrency issues where two primary exists for a given shard at the same time and upload segment files which can overwrite each other, we decided to add a unique suffix (UUID) to each segment filename that is uploaded to remote segment store.
- But when we restore these segment files, we need restore them as per their original name. Also, need a mechanism to understand which segment file is a part of particular commit checkpoint. For this, we also upload a metadata file per checkpoint (refresh/commit). This metadata file contains map of original segment name to uploaded segment name.
- The above logic is implemented in
RemoteSegmentStoreDirectory
which composes two instances of remote directories (one for segment and another for metadata) and still provides a directory interface for a caller. This way, caller would invoke directory methods ofRemoteSegmentStoreDirectory
in the same way asFSDirectory
. - Two instances of RemoteDirectory that are part of RemoteSegmentStoreDirectory:
- remoteDataDirectory:
<Cluster UUID>/<Index UUID>/<Shard ID>/segments/data
- remoteMetadataDirectory:
<Cluster UUID>/<Index UUID>/<Shard ID>/segments/metadata
- remoteDataDirectory:
- Sample Files under each directory path:
- remoteMetadataDirectory
-
refresh_mapping__1__z__lKDiNIIBrs0AUNsRcOa3
-
commit_mapping__1__z__lKDiNIIBrs0AUNsRcOa3
-
refresh_mapping__1__y__g6DeNIIBrs0AUNsRQubk
-
commit_mapping__1__y__g6DeNIIBrs0AUNsRQubk
-
- remoteDataDirectory
-
_10v.cfe__yXvPNIIBrs0AUNsRUXfQ
-
_10v.cfs__uHvPNIIBrs0AUNsRUVA2
-
_10v.si__h3vPNIIBrs0AUNsRSgJ-
-
_10w.cfe__UnrPNIIBrs0AUNsRR43c
-
_10w.cfs__ZXrPNIIBrs0AUNsRSbQU
-
_10w.si__KnrPNIIBrs0AUNsRRD-K
-
- remoteMetadataDirectory
Issues Resolved
- https://github.com/opensearch-project/OpenSearch/issues/3906
Check List
- [X] New functionality includes testing.
- [X] All tests pass
- [X] New functionality has been documented.
- [X] New functionality has javadoc added
- [X] Commits are signed per the DCO using --signoff
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Gradle Check (Jenkins) Run Completed with:
- RESULT: FAILURE :x:
- URL: https://build.ci.opensearch.org/job/gradle-check/1025/
- CommitID: 26025178955f231df71b8be27ae7a3a7362d20ea
Gradle Check (Jenkins) Run Completed with:
- RESULT: FAILURE :x:
- URL: https://build.ci.opensearch.org/job/gradle-check/1026/
- CommitID: 505acf9e05e552aac6948a62a061ad22fae7e386
Gradle Check (Jenkins) Run Completed with:
- RESULT: FAILURE :x:
- URL: https://build.ci.opensearch.org/job/gradle-check/1034/
- CommitID: fc25eaf72d6d06edcb47b70d9e2fcf380dc78bf5
@andrross @Bukhtawar Please review.
Gradle Check (Jenkins) Run Completed with:
- RESULT: SUCCESS :white_check_mark:
- URL: https://build.ci.opensearch.org/job/gradle-check/1064/
- CommitID: 6e053b47bfb66268f6cff5d562398a71d1d6102c
Codecov Report
Merging #4020 (2470f4a) into main (a469a3c) will increase coverage by
0.04%
. The diff coverage is86.59%
.
@@ Coverage Diff @@
## main #4020 +/- ##
============================================
+ Coverage 70.59% 70.64% +0.04%
- Complexity 57083 57118 +35
============================================
Files 4603 4605 +2
Lines 274551 274670 +119
Branches 40210 40223 +13
============================================
+ Hits 193831 194037 +206
+ Misses 64514 64378 -136
- Partials 16206 16255 +49
Impacted Files | Coverage Δ | |
---|---|---|
...ava/org/opensearch/client/RestHighLevelClient.java | 44.32% <ø> (-0.16%) |
:arrow_down: |
...gregations/metrics/GeoBoundsAggregatorFactory.java | 88.88% <ø> (ø) |
|
...search/aggregations/metrics/InternalGeoBounds.java | 66.66% <ø> (ø) |
|
...o/search/aggregations/metrics/ParsedGeoBounds.java | 88.00% <ø> (ø) |
|
.../main/java/org/opensearch/search/SearchModule.java | 96.27% <ø> (-0.03%) |
:arrow_down: |
...earch/search/aggregations/AggregationBuilders.java | 46.15% <ø> (+1.15%) |
:arrow_up: |
...regations/support/AggregationInspectionHelper.java | 53.65% <ø> (+1.27%) |
:arrow_up: |
...g/opensearch/test/InternalAggregationTestCase.java | 98.21% <ø> (-0.46%) |
:arrow_down: |
...a/org/opensearch/test/OpenSearchIntegTestCase.java | 57.37% <ø> (-0.11%) |
:arrow_down: |
...regations/metrics/AbstractGeoBoundsAggregator.java | 56.52% <56.52%> (ø) |
|
... and 477 more |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
/cc : @ashking94 for help with review
Update: I got to know that SegmentInfosSnapshot does not just contain incremental segment files since the last commit but contain list of all the live segment files for the given shard. This would simplify current approach where instead of keeping track of 2 separate metadata files (commit and refresh), we keep track of only one metadata file. This is just a thought as of now. I will make the changes after outlining all the details.
Gradle Check (Jenkins) Run Completed with:
- RESULT: UNSTABLE :x:
- URL: https://build.ci.opensearch.org/job/gradle-check/1153/
- CommitID: 813dc00faf2c4aebc109a39fd95b3f29cec946ee
Build failed with:
java.lang.AssertionError: Failure at [repository_s3/20_repository_permanent_credentials:152]: expected [2xx] status code but api [snapshot.create] returned [400 Bad Request] [{"error":{"root_cause":[{"type":"invalid_snapshot_name_exception","reason":"[repository_permanent:snapshot-one] Invalid snapshot name [snapshot-one], snapshot with the same name already exists","stack_trace":"InvalidSnapshotNameException[[repository_permanent:snapshot-one] Invalid snapshot name [snapshot-one], snapshot with the same name already exists]
This test is not related with the changes in this PR. Re-triggering the build.
Gradle Check (Jenkins) Run Completed with:
- RESULT: FAILURE :x:
- URL: https://build.ci.opensearch.org/job/gradle-check/1164/
- CommitID: 522905e8036ae7e0b721a931a3883080eadbd115
Created https://github.com/opensearch-project/OpenSearch/issues/4069 to track this.
Gradle Check (Jenkins) Run Completed with:
- RESULT: FAILURE :x:
- URL: https://build.ci.opensearch.org/job/gradle-check/1235/
- CommitID: 2efcc1067cbca71d68ae2e92419786fa75f49974
Gradle Check (Jenkins) Run Completed with:
- RESULT: FAILURE :x:
- URL: https://build.ci.opensearch.org/job/gradle-check/1239/
- CommitID: 1a4cbda5ca69ea58fdaa3bfe6e5448e7127b82ec
Gradle Check (Jenkins) Run Completed with:
- RESULT: UNSTABLE :x:
- URL: https://build.ci.opensearch.org/job/gradle-check/1258/
- CommitID: 1ef80346f1044a3b29c3ebeaf151ecad8baaa61e
Gradle Check (Jenkins) Run Completed with:
- RESULT: SUCCESS :white_check_mark:
- URL: https://build.ci.opensearch.org/job/gradle-check/1263/
- CommitID: f9c943065ec02409992d885b7d2af51febda1d9c
Gradle Check (Jenkins) Run Completed with:
- RESULT: FAILURE :x:
- URL: https://build.ci.opensearch.org/job/gradle-check/1684/
- CommitID: ce356bf67476ba09b8ce0b7693d1ecd23f08b769
Build is failing with:
Execution failed for task ':distribution:bwc:minor:buildBwcLinuxTar'.
Not related to the current change, re-triggering the build.
Gradle Check (Jenkins) Run Completed with:
- RESULT: FAILURE :x:
- URL: https://build.ci.opensearch.org/job/gradle-check/1685/
- CommitID: dafb962f0564dcb29205039d867a82190c3de463
Failing tests:
REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.testRestoreSnapshotAllocationDoesNotExceedWatermark" -Dtests.seed=DFD6B4D068693FA8 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=en-PH -Dtests.timezone=ROK -Druntime.java=17
Seems like a flaky test. Re-triggering build
Gradle Check (Jenkins) Run Completed with:
- RESULT: SUCCESS :white_check_mark:
- URL: https://build.ci.opensearch.org/job/gradle-check/1689/
- CommitID: 2470f4a310dfe06fbb5fef67fd33785d20a25ae6