OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

Add support to upload snapshot shard blobs with hashed prefix

Open ashking94 opened this issue 1 year ago • 15 comments

Description

Following up from the RFC https://github.com/opensearch-project/OpenSearch/issues/15146, in this PR, we implement the same. More details can be seen in the RFC itself. Below is the summarised change -

  1. Snapshot models each index as an IndexId. Multiple indexes can share the same IndexId even though they are fundamentally different. In this PR, we are updating the IndexId class to also hold information about the pathType.
  2. The data corresponding to each of the IndexId is then stored in the index-N file (i.e. RepositoryData). Due to this, the metadata about the pathType is available at no additional cost during Snapshot operations like Creation, Deletion, Clone, cleanup.
  3. To handle no zombie data due to the hashed prefix nature of the path, we also have introduced a snapshot_shard_paths file which keeps the data about the paths for all the shards for an IndexId. The same information is also used later during stale blob deletion. This file is cleaned up only when all the paths present in the file are deleted. Also, that the stale blob cleanup is done after the index-N file upload, which can lead to cases where the pathType information is no more available.

Related Issues

Resolves #15146

Check List

  • [ ] Functionality includes testing.
  • [ ] API changes companion pull request created, if applicable.
  • [ ] Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

ashking94 avatar Aug 26 '24 18:08 ashking94

Flaky test https://github.com/opensearch-project/OpenSearch/issues/15117 seen. Build - https://build.ci.opensearch.org/job/gradle-check/45417/testReport/junit/org.opensearch.cluster.service/MasterServiceTests/testClusterStateUpdateLoggingWithDebugEnabled/

ashking94 avatar Aug 27 '24 17:08 ashking94

:grey_exclamation: Gradle check result for e21d93710b5cb73ec30a4f62130c215a447f2601: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

github-actions[bot] avatar Aug 27 '24 18:08 github-actions[bot]

Codecov Report

Attention: Patch coverage is 72.97297% with 90 lines in your changes missing coverage. Please review.

Project coverage is 72.00%. Comparing base (71d122b) to head (7c5b751). Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...ch/repositories/blobstore/BlobStoreRepository.java 69.65% 62 Missing and 9 partials :warning:
...a/org/opensearch/snapshots/SnapshotShardPaths.java 85.36% 3 Missing and 3 partials :warning:
...mote/directory/RemoteSnapshotDirectoryFactory.java 0.00% 5 Missing :warning:
...main/java/org/opensearch/repositories/IndexId.java 80.00% 1 Missing and 2 partials :warning:
...va/org/opensearch/repositories/RepositoryData.java 83.33% 0 Missing and 2 partials :warning:
...ava/org/opensearch/snapshots/SnapshotsService.java 60.00% 1 Missing and 1 partial :warning:
...arch/index/recovery/RemoteStoreRestoreService.java 0.00% 1 Missing :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #15426      +/-   ##
============================================
+ Coverage     71.99%   72.00%   +0.01%     
- Complexity    63700    63753      +53     
============================================
  Files          5248     5249       +1     
  Lines        297416   297643     +227     
  Branches      42984    43011      +27     
============================================
+ Hits         214113   214309     +196     
+ Misses        65776    65698      -78     
- Partials      17527    17636     +109     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Aug 27 '24 18:08 codecov[bot]

:x: Gradle check result for 19750710c9c422924ea4e8ed965f3d72edc534d1: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 27 '24 20:08 github-actions[bot]

❌ Gradle check result for 1975071: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

https://github.com/opensearch-project/OpenSearch/issues/14314

ashking94 avatar Aug 28 '24 04:08 ashking94

:white_check_mark: Gradle check result for 19750710c9c422924ea4e8ed965f3d72edc534d1: SUCCESS

github-actions[bot] avatar Aug 28 '24 05:08 github-actions[bot]

:x: Gradle check result for c04eb67f1523884392c0d6df27191ad958e4193c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 28 '24 05:08 github-actions[bot]

❌ Gradle check result for c04eb67: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Flaky test - https://github.com/opensearch-project/OpenSearch/issues/14599

ashking94 avatar Aug 28 '24 05:08 ashking94

:white_check_mark: Gradle check result for c04eb67f1523884392c0d6df27191ad958e4193c: SUCCESS

github-actions[bot] avatar Aug 28 '24 06:08 github-actions[bot]

:white_check_mark: Gradle check result for fc821932cb1e722c5ed8cf046fe1960944230dba: SUCCESS

github-actions[bot] avatar Aug 28 '24 06:08 github-actions[bot]

:x: Gradle check result for 1473c92664b39c690b2ba91db251097a3e5f990a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 28 '24 10:08 github-actions[bot]

❌ Gradle check result for 1473c92: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Flaky test - https://github.com/opensearch-project/OpenSearch/issues/14327

ashking94 avatar Aug 28 '24 10:08 ashking94

:grey_exclamation: Gradle check result for 1473c92664b39c690b2ba91db251097a3e5f990a: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

github-actions[bot] avatar Aug 28 '24 11:08 github-actions[bot]

:x: Gradle check result for 18a8469e81f6983e6ffd45eda0d94f33579dd549: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 28 '24 11:08 github-actions[bot]

❌ Gradle check result for 18a8469: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Flaky tests - https://github.com/opensearch-project/OpenSearch/issues/14327, https://github.com/opensearch-project/OpenSearch/issues/14293, https://github.com/opensearch-project/OpenSearch/issues/7791

ashking94 avatar Aug 28 '24 12:08 ashking94

:grey_exclamation: Gradle check result for d0c0fae42e7f3edd1bde062853de0deb044da3d2: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

github-actions[bot] avatar Aug 28 '24 15:08 github-actions[bot]

Are existing ITs exercising these code paths or do we need to explicitly add them

+1, Are we planning to run hash prefix/fixed mode in a random manner ?

gbbafna avatar Aug 28 '24 16:08 gbbafna

❕ Gradle check result for d0c0fae: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Flaky test - https://github.com/opensearch-project/OpenSearch/issues/14408

ashking94 avatar Aug 28 '24 16:08 ashking94

Are existing ITs exercising these code paths or do we need to explicitly add them

+1, Are we planning to run hash prefix/fixed mode in a random manner ?

That's right, I am planning to run once with default mode changed and then will make it random so that we are covering all paths randomly. Also, I have tried to cover the code paths using unit tests. However, the existing coverage is really low and we might need to put in additional effort to cover the existing code in future.

ashking94 avatar Aug 28 '24 17:08 ashking94

:x: Gradle check result for 173514b9ce3cb2b0947fc8cba5e20f797c335235: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 28 '24 18:08 github-actions[bot]

:x: Gradle check result for e097aa9e28c3c0e15cc43de944ff9a8af0e8c357: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 28 '24 18:08 github-actions[bot]

❌ Gradle check result for 173514b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Flaky tests - https://github.com/opensearch-project/OpenSearch/issues/14295, https://github.com/opensearch-project/OpenSearch/issues/14329

ashking94 avatar Aug 29 '24 06:08 ashking94

:x: Gradle check result for 6f526e4cd61b60d6e71acb509d382fcaaa5d2480: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 29 '24 08:08 github-actions[bot]

❌ Gradle check result for 6f526e4: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Fixing these errors.

ashking94 avatar Aug 29 '24 16:08 ashking94

:x: Gradle check result for 596b6ded7e9be8b0f21624402b67079ecf789694: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 31 '24 12:08 github-actions[bot]

:x: Gradle check result for 798cf4bc57c95209e652642689c9c6bb32b17aef: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 31 '24 15:08 github-actions[bot]

❌ Gradle check result for 798cf4b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Unrelated failures. This is happening due to BWC discrepancy b/w main and 2.x branch. Failure in other PR build - https://build.ci.opensearch.org/job/gradle-check/46035/

ashking94 avatar Aug 31 '24 16:08 ashking94

:x: Gradle check result for 10f15ab73fb692021c683c141bd4175060285c81: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 31 '24 18:08 github-actions[bot]

:white_check_mark: Gradle check result for 7c5b7515cad46e282ed1ba46211ea49abc5733bd: SUCCESS

github-actions[bot] avatar Sep 01 '24 14:09 github-actions[bot]

:x: Gradle check result for 0dea663c8958f2b38b15da545e2ac26d46267e7c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Sep 01 '24 14:09 github-actions[bot]