OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

[Remote Publication] Add remote download stats

Open alchemist51 opened this issue 1 year ago • 15 comments

Description

This PR introduces the remote download stats for the remote publication.

Sample stats on data node:

                "cluster_state_stats": {
                    "overall": {
                        "update_count": 0,
                        "total_time_in_millis": 0,
                        "failed_count": 0
                    },
                    "remote_download": {
                        "success_count": 1,
                        "failed_count": 0,
                        "total_time_in_millis": 4,
                        "full_download": 1,
                        "diff_download": 0
                    }
                }

Sample stats on master node:

"cluster_state_stats": {
                    "overall": {
                        "update_count": 3,
                        "total_time_in_millis": 192,
                        "failed_count": 0
                    },
                    "remote_upload": {
                        "success_count": 3,
                        "failed_count": 0,
                        "total_time_in_millis": 86,
                        "indices_routing_diff_files_cleanup_attempt_failed_count": 0,
                        "index_routing_files_cleanup_attempt_failed_count": 0,
                        "cleanup_attempt_failed_count": 0
                    },
                    "remote_download": {
                        "success_count": 0,
                        "failed_count": 0,
                        "total_time_in_millis": 0,
                        "full_download": 0,
                        "diff_download": 0
                    }
                }

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • [ ] Functionality includes testing.
  • [ ] API changes companion pull request created, if applicable.
  • [ ] Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

alchemist51 avatar Aug 19 '24 04:08 alchemist51

:x: Gradle check result for 268258a707f732492f54dff9935a43aff4043ba4: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 19 '24 04:08 github-actions[bot]

:x: Gradle check result for 52aaa2f1d8a2123028c6b0ffc32a07239c307108: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 19 '24 04:08 github-actions[bot]

:x: Gradle check result for 5d1f7cbb6be379aeac9cfce5bb78a656ef4fc8d8: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 19 '24 05:08 github-actions[bot]

 "indices_routing_diff_files_cleanup_attempt_failed_count": 0,
  "index_routing_files_cleanup_attempt_failed_count": 0,

The above stats looks out of place

Agree, these are added as part of these PR: #13909 #14684 . Should we create an issue to track this? @Bukhtawar

alchemist51 avatar Aug 20 '24 04:08 alchemist51

:x: Gradle check result for 4791e2b94ed0592cf72b68128215b0911a65e9ce: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 21 '24 10:08 github-actions[bot]

:x: Gradle check result for 5994e9c1f7937addcb7f0b1750f7f802e8886966: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 27 '24 07:08 github-actions[bot]

:x: Gradle check result for f6d1b5a8a13a31318081f18acd9c91f7f63ce7fe: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 27 '24 12:08 github-actions[bot]

:x: Gradle check result for 28ab45470328be21fd7a13bdb8582951fbd82fa2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 27 '24 17:08 github-actions[bot]

:x: Gradle check result for faaece8d1d673346e328ba5c1eca81cb8bdeff74: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 28 '24 05:08 github-actions[bot]

:x: Gradle check result for b8f74eedc5976da036d6d1efd494f3c0b7412401: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 28 '24 06:08 github-actions[bot]

:x: Gradle check result for 0caf5d0240360b97561d14f106f4ad56ad510199: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 28 '24 07:08 github-actions[bot]

:x: Gradle check result for a361d4837f76287d5bf492ef3e7af28fdd878f1e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 28 '24 08:08 github-actions[bot]

:x: Gradle check result for 42277f617ad3e9c457b358247e1a34129d22d3bd: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 28 '24 08:08 github-actions[bot]

:grey_exclamation: Gradle check result for d6bca95e1313a432c55b0698383587cc98950b1d: UNSTABLE

  • TEST FAILURES:
      3 org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

github-actions[bot] avatar Aug 28 '24 10:08 github-actions[bot]

Codecov Report

Attention: Patch coverage is 69.76744% with 39 lines in your changes missing coverage. Please review.

Project coverage is 72.07%. Comparing base (758c2aa) to head (a1a6b82). Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...ster/coordination/PublicationTransportHandler.java 55.10% 19 Missing and 3 partials :warning:
...nsearch/gateway/remote/RemotePersistenceStats.java 77.41% 7 Missing :warning:
...arch/gateway/remote/RemoteClusterStateService.java 77.77% 6 Missing :warning:
...g/opensearch/cluster/coordination/Coordinator.java 0.00% 2 Missing and 1 partial :warning:
.../java/org/opensearch/gateway/GatewayMetaState.java 0.00% 1 Missing :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #15291      +/-   ##
============================================
+ Coverage     72.02%   72.07%   +0.05%     
- Complexity    63769    63844      +75     
============================================
  Files          5249     5250       +1     
  Lines        297795   297859      +64     
  Branches      43034    43038       +4     
============================================
+ Hits         214480   214687     +207     
+ Misses        65735    65613     -122     
+ Partials      17580    17559      -21     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Aug 28 '24 11:08 codecov[bot]

Looks good

soosinha avatar Aug 29 '24 08:08 soosinha

:x: Gradle check result for bfd119a38e45401c5347ddb616542eb28b9f8f80: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 30 '24 09:08 github-actions[bot]

:white_check_mark: Gradle check result for 0cd57fb97dcdfe405849e5098847061cc5306a95: SUCCESS

github-actions[bot] avatar Aug 30 '24 09:08 github-actions[bot]

:grey_exclamation: Gradle check result for 09c8e2ab8335759120457bb5ca2ddc967bd6193c: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

github-actions[bot] avatar Aug 30 '24 17:08 github-actions[bot]

How are we handling API BWC with the new stats?

In ClusterStateStats the remote upload/download stats are stored in the list of PersistedStateStats:

    public void writeTo(StreamOutput out) throws IOException {
        out.writeVLong(updateSuccess.get());
        out.writeVLong(updateTotalTimeInMillis.get());
        out.writeVLong(updateFailed.get());
        out.writeVInt(persistenceStats.size());
        for (PersistedStateStats stats : persistenceStats) {
            stats.writeTo(out);
        }
    }

For download stats we have added two more elements in the list. The writeTo/readFrom are written in a way to support different length of PersistedStateStats therefore the bwc is handled for our stats.

alchemist51 avatar Sep 01 '24 05:09 alchemist51

:x: Gradle check result for e319bb2aae2d5c24dc7a0c11dd3b18263ed1e298: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Sep 01 '24 06:09 github-actions[bot]

The bwc tests are failing in all PRs: https://build.ci.opensearch.org/job/gradle-check/46059/testReport/ .

alchemist51 avatar Sep 01 '24 08:09 alchemist51

Opened issue for streamlining the cleanup stats: #15556

alchemist51 avatar Sep 01 '24 09:09 alchemist51

:x: Gradle check result for 3d440bba43707d45c1363c880121bc7b0cf314d2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Sep 01 '24 10:09 github-actions[bot]

:grey_exclamation: Gradle check result for 039ad00136979413685da82ddab834a9954349a8: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

github-actions[bot] avatar Sep 01 '24 12:09 github-actions[bot]

:x: Gradle check result for b9125bf3525c85a7ce98cb80f55a967789968d99: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Sep 02 '24 05:09 github-actions[bot]

:x: Gradle check result for e6bf8db3c670e8d504b2d5ef0b3f90c14b3a6f12: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Sep 02 '24 07:09 github-actions[bot]

:white_check_mark: Gradle check result for a1a6b82542aa91444db359acef4fb72f1104e069: SUCCESS

github-actions[bot] avatar Sep 02 '24 08:09 github-actions[bot]

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-15291-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 b54e867da0e313513e90872e039717b7595cf6e4
# Push it to GitHub
git push --set-upstream origin backport/backport-15291-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-15291-to-2.x.