OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

Populate RecoveryState details for shallow snapshot restore

Open ltaragi opened this issue 1 year ago • 8 comments

Description

  • In a regular active recovery, the recovery stats for the _cat/recovery call are populated as
index | shard | time | type | stage | source_host | source_node | target_host | target_node | repository | snapshot | files | files_recovered | files_percent | files_total | bytes | bytes_recovered | bytes_percent | bytes_total | translog_ops | translog_ops_recovered | translog_ops_percent
movies | 0 | 117ms | empty_store | done | n/a | n/a | 172.18.0.4 | odfe-node1 | n/a | n/a | 0 | 0 | 0.0% | 0 | 0 | 0 | 0.0% | 0 | 0 | 0 | 100.0%
movies | 0 | 382ms | peer | done | 172.18.0.4 | odfe-node1 | 172.18.0.3 | odfe-node2 | n/a | n/a | 1 | 1 |  100.0% | 1 | 208 | 208 | 100.0% | 208 | 1 | 1 | 100.0%
  • Information like bytes_recovered, bytes_total, etc. is obtained from the ReplicationLuceneIndex object of the RecoveryState for the shard being recovered
public Table buildRecoveryTable(RestRequest request, RecoveryResponse response) {
    ...
    for (String index : response.shardRecoveryStates().keySet()) {
        ...
        for (RecoveryState state : shardRecoveryStates) {
            t.startRow();
            t.addCell(index);
            t.addCell(state.getShardId().id());
            ...
            t.addCell(state.getIndex().totalRecoverFiles());
            t.addCell(state.getIndex().recoveredFileCount());
            ...
            t.endRow();
        }
    }
    return t;
}
  • ReplicationLuceneIndex gets this data with addFileDetail() and addRecoveredBytesToFile() as and when these are called during the restore flow
  • In case of restoration of shallow snapshots, these functions are never called and the stats are not populated.
  • This change adds these details of count and covered percentage for files/bytes to RecoveryState of shards being restored from a shallow snapshot

Related Issues

Resolves #15434

Check List

  • [x] Functionality includes testing.
  • [ ] ~API changes companion pull request created, if applicable.~
  • [ ] ~Public documentation issue/PR created, if applicable.~

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

ltaragi avatar Aug 22 '24 14:08 ltaragi

:x: Gradle check result for f8e731bb0a460aae537d8fbe7243cd98c2d796eb: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 22 '24 14:08 github-actions[bot]

:x: Gradle check result for e6bcc79b624cf21de30b9cdf4a9430dbd4dd79f0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 22 '24 14:08 github-actions[bot]

❌ Gradle check result for f8e731b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

❌ Gradle check result for e6bcc79: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Flaky test #14294

ltaragi avatar Aug 22 '24 15:08 ltaragi

:x: Gradle check result for c48289293ec26f5ce8b994a6d596329b3d479324: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 23 '24 05:08 github-actions[bot]

:x: Gradle check result for 7bccf910df8cb5556743223810f2d6e89a6e550a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 23 '24 06:08 github-actions[bot]

:x: Gradle check result for 49141d8b91ff0f41baa7c64eda5cbd947d0b057c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 26 '24 15:08 github-actions[bot]

:grey_exclamation: Gradle check result for fcaa3a40682c77bf8c9f84d34cf227a812efceb5: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

github-actions[bot] avatar Aug 27 '24 06:08 github-actions[bot]

Codecov Report

Attention: Patch coverage is 11.11111% with 8 lines in your changes missing coverage. Please review.

Project coverage is 71.84%. Comparing base (acee2ae) to head (668dfec). Report is 27 commits behind head on main.

Files with missing lines Patch % Lines
...in/java/org/opensearch/index/shard/IndexShard.java 11.11% 7 Missing and 1 partial :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #15353      +/-   ##
============================================
- Coverage     71.87%   71.84%   -0.03%     
- Complexity    63318    63402      +84     
============================================
  Files          5231     5244      +13     
  Lines        296521   296797     +276     
  Branches      42832    42852      +20     
============================================
+ Hits         213113   213230     +117     
- Misses        65948    66128     +180     
+ Partials      17460    17439      -21     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Aug 27 '24 06:08 codecov[bot]

:white_check_mark: Gradle check result for 668dfec3ed87cdd838d961912e2a641ae3cb9b79: SUCCESS

github-actions[bot] avatar Aug 29 '24 07:08 github-actions[bot]

The changes are covered in integ test. CodeCov does not consider ITs, that is why the check is failing.

sachinpkale avatar Aug 29 '24 07:08 sachinpkale

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-15353-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 3726c52b31e8504e7fcf9cdc1b52a0a404d6c944
# Push it to GitHub
git push --set-upstream origin backport/backport-15353-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-15353-to-2.x.

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-15353-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 3726c52b31e8504e7fcf9cdc1b52a0a404d6c944
# Push it to GitHub
git push --set-upstream origin backport/backport-15353-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-15353-to-2.x.