Add support to upload snapshot shard blobs with hashed prefix
Description
Following up from the RFC https://github.com/opensearch-project/OpenSearch/issues/15146, in this PR, we implement the same. More details can be seen in the RFC itself. Below is the summarised change -
- Snapshot models each index as an IndexId. Multiple indexes can share the same IndexId even though they are fundamentally different. In this PR, we are updating the IndexId class to also hold information about the pathType.
- The data corresponding to each of the IndexId is then stored in the index-N file (i.e. RepositoryData). Due to this, the metadata about the pathType is available at no additional cost during Snapshot operations like Creation, Deletion, Clone, cleanup.
- To handle no zombie data due to the hashed prefix nature of the path, we also have introduced a snapshot_shard_paths file which keeps the data about the paths for all the shards for an IndexId. The same information is also used later during stale blob deletion. This file is cleaned up only when all the paths present in the file are deleted. Also, that the stale blob cleanup is done after the index-N file upload, which can lead to cases where the pathType information is no more available.
Related Issues
Resolves #15146
Check List
- [ ] Functionality includes testing.
- [ ] API changes companion pull request created, if applicable.
- [ ] Public documentation issue/PR created, if applicable.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Flaky test https://github.com/opensearch-project/OpenSearch/issues/15117 seen. Build - https://build.ci.opensearch.org/job/gradle-check/45417/testReport/junit/org.opensearch.cluster.service/MasterServiceTests/testClusterStateUpdateLoggingWithDebugEnabled/
:grey_exclamation: Gradle check result for e21d93710b5cb73ec30a4f62130c215a447f2601: UNSTABLE
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.
Codecov Report
Attention: Patch coverage is 72.97297% with 90 lines in your changes missing coverage. Please review.
Project coverage is 72.00%. Comparing base (
71d122b) to head (7c5b751). Report is 1 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #15426 +/- ##
============================================
+ Coverage 71.99% 72.00% +0.01%
- Complexity 63700 63753 +53
============================================
Files 5248 5249 +1
Lines 297416 297643 +227
Branches 42984 43011 +27
============================================
+ Hits 214113 214309 +196
+ Misses 65776 65698 -78
- Partials 17527 17636 +109
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:x: Gradle check result for 19750710c9c422924ea4e8ed965f3d72edc534d1: null
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
❌ Gradle check result for 1975071: null
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
https://github.com/opensearch-project/OpenSearch/issues/14314
:white_check_mark: Gradle check result for 19750710c9c422924ea4e8ed965f3d72edc534d1: SUCCESS
:x: Gradle check result for c04eb67f1523884392c0d6df27191ad958e4193c: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
❌ Gradle check result for c04eb67: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
Flaky test - https://github.com/opensearch-project/OpenSearch/issues/14599
:white_check_mark: Gradle check result for c04eb67f1523884392c0d6df27191ad958e4193c: SUCCESS
:white_check_mark: Gradle check result for fc821932cb1e722c5ed8cf046fe1960944230dba: SUCCESS
:x: Gradle check result for 1473c92664b39c690b2ba91db251097a3e5f990a: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
❌ Gradle check result for 1473c92: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
Flaky test - https://github.com/opensearch-project/OpenSearch/issues/14327
:grey_exclamation: Gradle check result for 1473c92664b39c690b2ba91db251097a3e5f990a: UNSTABLE
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.
:x: Gradle check result for 18a8469e81f6983e6ffd45eda0d94f33579dd549: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
❌ Gradle check result for 18a8469: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
Flaky tests - https://github.com/opensearch-project/OpenSearch/issues/14327, https://github.com/opensearch-project/OpenSearch/issues/14293, https://github.com/opensearch-project/OpenSearch/issues/7791
:grey_exclamation: Gradle check result for d0c0fae42e7f3edd1bde062853de0deb044da3d2: UNSTABLE
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.
Are existing ITs exercising these code paths or do we need to explicitly add them
+1, Are we planning to run hash prefix/fixed mode in a random manner ?
❕ Gradle check result for d0c0fae: UNSTABLE
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.
Flaky test - https://github.com/opensearch-project/OpenSearch/issues/14408
Are existing ITs exercising these code paths or do we need to explicitly add them
+1, Are we planning to run hash prefix/fixed mode in a random manner ?
That's right, I am planning to run once with default mode changed and then will make it random so that we are covering all paths randomly. Also, I have tried to cover the code paths using unit tests. However, the existing coverage is really low and we might need to put in additional effort to cover the existing code in future.
:x: Gradle check result for 173514b9ce3cb2b0947fc8cba5e20f797c335235: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for e097aa9e28c3c0e15cc43de944ff9a8af0e8c357: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
❌ Gradle check result for 173514b: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
Flaky tests - https://github.com/opensearch-project/OpenSearch/issues/14295, https://github.com/opensearch-project/OpenSearch/issues/14329
:x: Gradle check result for 6f526e4cd61b60d6e71acb509d382fcaaa5d2480: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
❌ Gradle check result for 6f526e4: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
Fixing these errors.
:x: Gradle check result for 596b6ded7e9be8b0f21624402b67079ecf789694: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 798cf4bc57c95209e652642689c9c6bb32b17aef: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
❌ Gradle check result for 798cf4b: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
Unrelated failures. This is happening due to BWC discrepancy b/w main and 2.x branch. Failure in other PR build - https://build.ci.opensearch.org/job/gradle-check/46035/
:x: Gradle check result for 10f15ab73fb692021c683c141bd4175060285c81: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:white_check_mark: Gradle check result for 7c5b7515cad46e282ed1ba46211ea49abc5733bd: SUCCESS
:x: Gradle check result for 0dea663c8958f2b38b15da545e2ac26d46267e7c: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?