OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

Optimize remote state stale file deletion

Open shiv0408 opened this issue 10 months ago • 15 comments

Description

Created an async task AsyncStaleFileDeletion which is initialized on only master eligible nodes. Once the task in initialized in RemoteClusterStateService start(), it schedules a clean up after specified interval which is added as a dynamic setting cluster.remote_store.state.cleanup_interval with 5 min default. Clean up is also proceeded with if we have more than 10 successful states updates since last clean up. After trying clean up once, we schedule the task again after the set interval.

Related Issues

Resolves #12889 Resolves #12798 that test case is getting removed in this PR

Check List

  • [x] New functionality includes testing.
    • [x] All tests pass
  • [x] New functionality has been documented.
    • [x] New functionality has javadoc added
  • [x] Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • [x] Commits are signed per the DCO using --signoff
  • [x] Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

shiv0408 avatar Apr 09 '24 06:04 shiv0408

Compatibility status:

Checks if related components are compatible with change 3b7464c

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/flow-framework.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/sql.git]

github-actions[bot] avatar Apr 09 '24 07:04 github-actions[bot]

:x: Gradle check result for c320b67d8ab5c9dd571a82f0b48a8f6908ead3f3: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Apr 09 '24 07:04 github-actions[bot]

:x: Gradle check result for 3b7464c571bebf91587c4c475ca350278775ba7b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Apr 11 '24 19:04 github-actions[bot]

Removing the "Storage:Remote" as this is purely cluster manager state. @shwetathareja @Bukhtawar should we create a separate label for remote cluster state to avoid duplication between storage for data vs storage for cluster state? May be "Storage:Remote" can be used for data and "Storage:RemoteState" for cluster state?

rramachand21 avatar Apr 21 '24 11:04 rramachand21

:x: Gradle check result for ad17589fd0072ff63cae8de0d00e46bab47e0113: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Apr 26 '24 10:04 github-actions[bot]

:x: Gradle check result for d4f09e2c2c90ead111b56e66153f82bbfa3c6b04: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Apr 29 '24 14:04 github-actions[bot]

:x: Gradle check result for 3e2601eb9edb6022e7d731a72ba7c8987b7a5db3: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Apr 29 '24 20:04 github-actions[bot]

:x: Gradle check result for de616128e22784996df9d21e8147437cff5a2802: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Apr 29 '24 21:04 github-actions[bot]

:x: Gradle check result for 3408bd70664aed36a53cbb422754b446b086a9ac: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Apr 29 '24 22:04 github-actions[bot]

:x: Gradle check result for 00a36ab4168be3d7fe076676e5b8a22532f71680: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Apr 30 '24 11:04 github-actions[bot]

:x: Gradle check result for 1febb6816e3a628e7826b7c502961b1dc6c7d4be: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Apr 30 '24 13:04 github-actions[bot]

:x: Gradle check result for 8f5d7d7f1977b9e7685521175648195b014c2c38: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar May 02 '24 11:05 github-actions[bot]

:x: Gradle check result for 4e3c9e92856247fe56242aea4ad3b7b876e68727: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar May 02 '24 11:05 github-actions[bot]

:x: Gradle check result for 5ea5dbb565dee48ac276950afb7b5191e54bc8d1: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar May 02 '24 11:05 github-actions[bot]

[Storage Triage - attendees 1 2 3 4 5 6 7 8 9 10 11 12 13]

@shiv0408 Thanks for taking this up. Lets add a release target label to this PR.

linuxpi avatar May 02 '24 15:05 linuxpi

Looks good. Minor comments

soosinha avatar May 13 '24 02:05 soosinha

:x: Gradle check result for 04140ff03ca9d4764fc0421b3a3cd1e0811a8653: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar May 13 '24 09:05 github-actions[bot]

:x: Gradle check result for 1a6940f45adb59a69bbd11177a68b62631433709: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar May 13 '24 10:05 github-actions[bot]

Gradle build is failing, please fix the build

sachinpkale avatar May 14 '24 04:05 sachinpkale

:x: Gradle check result for 2f2719e06b9853596c58a950a7db445193f1ed4b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar May 14 '24 22:05 github-actions[bot]

org.opensearch.http.SearchRestCancellationIT.testAutomaticCancellationMultiSearchDuringQueryPhase - New issue created for this flaky test #13674 org.opensearch.http.SearchRestCancellationIT.testAutomaticCancellationDuringFetchPhase - Already identified as flaky in #5426 org.opensearch.remotemigration.RemoteReplicaRecoveryIT.testReplicaRecovery - flaky #13473

shiv0408 avatar May 14 '24 23:05 shiv0408

:x: Gradle check result for 2f2719e06b9853596c58a950a7db445193f1ed4b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar May 15 '24 08:05 github-actions[bot]

:x: Gradle check result for 3ff82b335a88779e95357800f1309a56d1e9e1db: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar May 15 '24 10:05 github-actions[bot]

:x: Gradle check result for bb09f56d552cc000c660903812a9a6de8f6cc953: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar May 16 '24 01:05 github-actions[bot]

:grey_exclamation: Gradle check result for 12632fcd9d8f186ba06fe938abda6d25d9534199: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.repositories.azure.AzureBlobContainerRetriesTests.testReadRangeBlobWithRetries

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

github-actions[bot] avatar May 16 '24 07:05 github-actions[bot]

Codecov Report

Attention: Patch coverage is 77.83784% with 41 lines in your changes are missing coverage. Please review.

Project coverage is 71.60%. Comparing base (b15cb0c) to head (2f8d2e1). Report is 315 commits behind head on main.

Files Patch % Lines
...teway/remote/RemoteClusterStateCleanupManager.java 79.19% 29 Missing and 7 partials :warning:
...arch/gateway/remote/RemoteClusterStateService.java 55.55% 3 Missing and 1 partial :warning:
server/src/main/java/org/opensearch/node/Node.java 66.66% 1 Missing :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #13131      +/-   ##
============================================
+ Coverage     71.42%   71.60%   +0.18%     
- Complexity    59978    61331    +1353     
============================================
  Files          4985     5064      +79     
  Lines        282275   288089    +5814     
  Branches      40946    41715     +769     
============================================
+ Hits         201603   206288    +4685     
- Misses        63999    64794     +795     
- Partials      16673    17007     +334     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar May 16 '24 09:05 codecov[bot]

:x: Gradle check result for 38259512faef6e7d0a227d0b0b9326833bb3db8d: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar May 16 '24 09:05 github-actions[bot]

:x: Gradle check result for b604b090ad5d5298644099b650d0fe67f51ca4f5: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar May 17 '24 11:05 github-actions[bot]

:x: Gradle check result for 507941508007d983af16d16d6f2e55ec7389d4a2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar May 22 '24 09:05 github-actions[bot]

:x: Gradle check result for 7bd079fb227ca661a297653ebe53cb282fdac74f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar May 22 '24 09:05 github-actions[bot]