OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

[Experimental] Add a feature flag to start without joining a cluster

Open msfroh opened this issue 6 months ago • 11 comments

Description

This is a rework of the extent of core changes from my proof-of-concept for a "clusterless" OpenSearch. Everything else is implemented in a plugin.

Essentially, if the flag is set, we avoid creating DiscoveryModule or anything that requires it, including GatewayService. We still create ClusterService, but do not initialize a ClusterManagerService. There are a few actions that rely on an injected Discovery instance, so those also need to be removed when the flag is set.

Related Issues

Related to https://github.com/opensearch-project/OpenSearch/issues/17957

Check List

  • [ ] Functionality includes testing.
  • [ ] API changes companion pull request created, if applicable.
  • [ ] Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

msfroh avatar Jun 09 '25 23:06 msfroh

:x: Gradle check result for 83ea4830495dc5234ecdfc61e686963248a6444c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jun 09 '25 23:06 github-actions[bot]

:x: Gradle check result for df6e71e75f8e0286295f5554e207ac2625a0954d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jun 10 '25 00:06 github-actions[bot]

:grey_exclamation: Gradle check result for 5b3218f14683c77fa1cd24de4e2e2486e6865df5: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

github-actions[bot] avatar Jun 10 '25 18:06 github-actions[bot]

Codecov Report

Attention: Patch coverage is 18.51852% with 66 lines in your changes missing coverage. Please review.

Project coverage is 72.69%. Comparing base (8f69dcf) to head (1cbb72b). Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
...ch/cluster/action/shard/LocalShardStateAction.java 0.00% 24 Missing :warning:
.../java/org/opensearch/discovery/LocalDiscovery.java 0.00% 17 Missing :warning:
...pensearch/cluster/service/LocalClusterService.java 0.00% 16 Missing :warning:
server/src/main/java/org/opensearch/node/Node.java 70.58% 2 Missing and 3 partials :warning:
...ain/java/org/opensearch/cluster/ClusterModule.java 50.00% 1 Missing and 1 partial :warning:
...nsearch/cluster/service/ClusterApplierService.java 0.00% 2 Missing :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #18479      +/-   ##
============================================
- Coverage     72.79%   72.69%   -0.10%     
+ Complexity    68525    68460      -65     
============================================
  Files          5574     5566       -8     
  Lines        314807   314505     -302     
  Branches      45675    45633      -42     
============================================
- Hits         229178   228644     -534     
- Misses        67046    67335     +289     
+ Partials      18583    18526      -57     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Jun 10 '25 18:06 codecov[bot]

:x: Gradle check result for bd91b9b9ced713bebd58013aa820084e048960b0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jun 10 '25 19:06 github-actions[bot]

:x: Gradle check result for 805ce7471ff5537fb175200a61b1530852b2d3ae: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jun 10 '25 21:06 github-actions[bot]

:x: Gradle check result for 805ce7471ff5537fb175200a61b1530852b2d3ae: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jun 11 '25 02:06 github-actions[bot]

:x: Gradle check result for 805ce7471ff5537fb175200a61b1530852b2d3ae: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jun 11 '25 04:06 github-actions[bot]

:x: Gradle check result for 805ce7471ff5537fb175200a61b1530852b2d3ae: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jun 11 '25 17:06 github-actions[bot]

:x: Gradle check result for 805ce7471ff5537fb175200a61b1530852b2d3ae: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jun 11 '25 23:06 github-actions[bot]

:x: Gradle check result for 805ce7471ff5537fb175200a61b1530852b2d3ae: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jun 12 '25 18:06 github-actions[bot]

:x: Gradle check result for 831630e78b9c2d2a9f02292fc32148a804e6911a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jun 16 '25 19:06 github-actions[bot]

:x: Gradle check result for 831630e78b9c2d2a9f02292fc32148a804e6911a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jun 18 '25 20:06 github-actions[bot]

:white_check_mark: Gradle check result for b92597672dc4f52532d2ba8f3fa94498b300afab: SUCCESS

github-actions[bot] avatar Jun 18 '25 22:06 github-actions[bot]

Now that Gradle check is passing, tagging a few people for feedback on the approach: @andrross, @shwetathareja, @mch2

Thanks!

msfroh avatar Jun 23 '25 18:06 msfroh

:x: Gradle check result for 0e2952de695a2b0f9c5664279c15e3d0c998edec: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jun 24 '25 21:06 github-actions[bot]

:grey_exclamation: Gradle check result for dfed82c41944f50a28c6b1dcde1d2b62505a0a8c: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

github-actions[bot] avatar Jun 24 '25 23:06 github-actions[bot]

:x: Gradle check result for 1561f2df0fe78d84e233e50cd1212a598325c66d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jun 24 '25 23:06 github-actions[bot]

Ouch... my latest clean-up removes the casts from ClusterModule, but adds ShardStateAction to the public API.

I can go either way in terms of resolving that. Either add the PublicApi annotation to ShardStateAction or bring back the explicit cast in ClusterModule.

msfroh avatar Jun 25 '25 00:06 msfroh

:white_check_mark: Gradle check result for b6bd47b3348ceb539eec36ad291d45fa06038bcc: SUCCESS

github-actions[bot] avatar Jun 25 '25 20:06 github-actions[bot]

Thanks, @rajiv-kv ! I've made a couple of changes in response to your comments. If you get chance, please take a look.

I disagree on exposing the cluster manager operations through LocalClusterService. At least for the time being, I specifically want to set up data nodes and coordinators that are incapable of cluster manager operations. Limiting things like this is not a one-way door, though. If we later decided that we do want to allow cluster manager operations, we can always add them. I definitely don't want to add them now, though, because it means I won't be able to take them away later.

msfroh avatar Jul 01 '25 20:07 msfroh

:x: Gradle check result for e44af703ba6e999a6f0140b8f0faf01efd9fb5c5: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jul 01 '25 20:07 github-actions[bot]

:white_check_mark: Gradle check result for 4a83a765e2021b8b49aca5b3959388109a1c7eb4: SUCCESS

github-actions[bot] avatar Jul 01 '25 22:07 github-actions[bot]

:x: Gradle check result for fb125df94d400a8457f718b65ccd9d8fea01502f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jul 10 '25 18:07 github-actions[bot]

This looks good to me, once you fix up the compiler errors in the latest commit :)

Pinging @shwetathareja and @rajiv-kv again for follow up reviews as well.

andrross avatar Jul 10 '25 20:07 andrross

:white_check_mark: Gradle check result for a94fe5f95ed2e61a5c6c1497070dba4bb49a57c9: SUCCESS

github-actions[bot] avatar Jul 14 '25 18:07 github-actions[bot]

This looks good to me, once you fix up the compiler errors in the latest commit :)

Pinging @shwetathareja and @rajiv-kv again for follow up reviews as well.

Done! Checks are passing again :+1:

msfroh avatar Jul 14 '25 19:07 msfroh

:x: Gradle check result for 6533ca7e26a9e5d1b8432987dbb2421599bef24d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jul 15 '25 00:07 github-actions[bot]

:x: Gradle check result for 6533ca7e26a9e5d1b8432987dbb2421599bef24d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jul 15 '25 02:07 github-actions[bot]

:x: Gradle check result for 6533ca7e26a9e5d1b8432987dbb2421599bef24d: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jul 15 '25 07:07 github-actions[bot]