OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

Add optional section of node analyzers into NodeInfo

Open lukas-vlcek opened this issue 2 years ago • 67 comments

Description

A first implementation proposal of a new REST API called "Nodes Analyzers".

There are some changes/extensions to original "core" API to avoid use of Java reflections.

I still need to add a few more tests. Especially tests that will set up a few artificial cluster nodes with controlled set of plugins and then verifying that the response contains all that is required (including Hunspell dictionaries).

Original proposal that was implemented as a plugin included some interesting REST API tests (see https://github.com/lukas-vlcek/OpenSearch-list-built-in-analyzers/). I think we will need to reimplement these differently.

Related Issues

Resolves #5481

Check List

  • [x] New functionality includes testing.
    • [x] All tests pass
  • [x] New functionality has been documented. (See https://github.com/opensearch-project/documentation-website/pull/6252)
    • [x] New functionality has javadoc added
  • [x] Commits are signed per the DCO using --signoff
  • [x] Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

lukas-vlcek avatar Oct 02 '23 13:10 lukas-vlcek

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/26563/
  • CommitID: 5b92192333f4eff5f67c6d381f257ee8b498d76c Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Oct 02 '23 13:10 github-actions[bot]

Compatibility status:

Checks if related components are compatible with change a9f4e77

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/flow-framework.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/performance-analyzer.git]

github-actions[bot] avatar Oct 02 '23 13:10 github-actions[bot]

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/26564/
  • CommitID: d11d0d0ee3f3ae57cc3bc6fcce0c75d9dff12787 Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Oct 02 '23 13:10 github-actions[bot]

I can see precommit check failed (missing JavaDoc), I am looking at it. Interestingly, when I run ./gradlew precommit locally then it passes without issues. Is that expected?

lukas-vlcek avatar Oct 02 '23 13:10 lukas-vlcek

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/26715/
  • CommitID: c9a44b13a5e6f77c6a8b07c59c352a637ac17e39 Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Oct 03 '23 14:10 github-actions[bot]

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/26725/
  • CommitID: f32c2e923ed294738594e8a0508fb97b7653d37f Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Oct 03 '23 14:10 github-actions[bot]

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/26879/
  • CommitID: 8dbc4c0ac8c27f0ab3c43deb316bef73e34c53f4 Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Oct 04 '23 12:10 github-actions[bot]

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/26881/
  • CommitID: 33a7412653ad31a08b2aa481db31d100e3e48920 Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Oct 04 '23 13:10 github-actions[bot]

Why did you implement this as a new API, rather than adding it to the existing NodeInfo?

I'm not necessarily suggesting that it's better to add it to NodeInfo, but that feels like the more consistent approach. Technically we don't need to add another API, because GET /_nodes/... already does what we need (and you can choose which particular infos you want).

Arguing against myself, though, one downside I can see is that NodeInfo is already huge -- adding yet another piece of information may just be making things worse.

msfroh avatar Oct 05 '23 21:10 msfroh

@msfroh I think you are right. It will make more sense to add it into NodesInfo. From the REST user POW it will look pretty much the same with the exception that there will be no new REST API introduced internally (and no need for nodes.analyzers.json file and no need to add exceptions in RestHighLevelClientTest).

This code has been living in the form of a plugin in my head for so long that I didn't consider other options 😄

I will change that. I will also separated fields name and classname because currently individual Analysis plugins and modules are identified using a single field name that is sort of concatenation of both name and classname. This will align with how plugins and modules are represented in respective sections of NodesInfo response. It is something I was considering to do anyway, see https://github.com/lukas-vlcek/OpenSearch-list-built-in-analyzers/issues/5

Will the NodesInfo code become even bigger? I hope it can be done without bloating it. Let's see...

lukas-vlcek avatar Oct 06 '23 09:10 lukas-vlcek

Why did you implement this as a new API, rather than adding it to the existing NodeInfo?

:+1: agree with @msfroh here, also we could make it opt-in ( by adding a new analyzers metric, GET /_nodes/analyzers, [1]) so it will be only returned on demand

[1] https://opensearch.org/docs/latest/api-reference/nodes-apis/nodes-info/#path-parameters

reta avatar Oct 06 '23 13:10 reta

also we could make it opt-in ( by adding a new analyzers metric, GET /_nodes/analyzers, [1]) so it will be only returned on demand

It looks like NodesInfoRequest defaults to requesting Metrics.allMetrics(). I kind of like the idea of changing that going forward, though.

(At the same time, I'm realizing that I shouldn't have added search pipelines metrics by default -- they're useful if you're trying to verify that all nodes in the cluster have a search pipeline processor, but otherwise just add noise. Similarly, the putSearchPipelineTrasnportAction should only request search pipeline metrics.)

msfroh avatar Oct 06 '23 20:10 msfroh

https://build.ci.opensearch.org/job/gradle-check/26881/console

Task :server:test

Tests with failures:

  • org.opensearch.action.admin.cluster.node.analyzers.NodesAnalyzersResponseTests.testEqualsAndHashcode
  • org.opensearch.action.admin.cluster.node.analyzers.NodesAnalyzersResponseTests.testSerialization
  • org.opensearch.repositories.blobstore.BlobStoreRepositoryRemoteIndexTests.testRetrieveShallowCopySnapshotCase1

noCharger avatar Oct 11 '23 17:10 noCharger

@noCharger Thanks for pointing this out.

I am currently moving the code around to implement agreed changes. I think those broken tests may be affected by that as well.

lukas-vlcek avatar Oct 11 '23 17:10 lukas-vlcek

@msfroh Just FYI I am introducing a "default" set of metrics to NodesInfoRequest. Compared to all set it will not include analyzers AND search_pipelines. So by default these two metrics won't be included in the response.

lukas-vlcek avatar Oct 11 '23 17:10 lukas-vlcek

@msfroh Just FYI I am introducing a "default" set of metrics to NodesInfoRequest. Compared to all set it will not include analyzers AND search_pipelines. So by default these two metrics won't be included in the response.

That sounds great!

To keep from breaking search pipelines (probably covered by an integ test), please update this NodeInfosRequest to explicitly request just search pipeline metrics: https://github.com/opensearch-project/OpenSearch/blob/1447d75e5e9ea042dd4bb645d71f25aea1fb42a8/server/src/main/java/org/opensearch/action/search/PutSearchPipelineTransportAction.java#L85

Up to now, it's been relying on search pipelines being in the default set (and fetching a bunch of other metrics that it doesn't use).

msfroh avatar Oct 13 '23 17:10 msfroh

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/28338/
  • CommitID: c5bf2775598b1d3d73c3d076054b2e347359fa51 Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Oct 18 '23 22:10 github-actions[bot]

Changed from a new REST API to an optional section of NodeInfo called analyzers.

It includes one breaking change in what is considered a default set of "metrics" included in NodeInfo response (search_pipelines and analyzers are excluded from the default set).

FYI: I will squash commits in the next update.

lukas-vlcek avatar Oct 18 '23 22:10 lukas-vlcek

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/28337/
  • CommitID: d2766805cc818fbde591c520ad03225d3d42046d Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Oct 18 '23 22:10 github-actions[bot]

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/28339/
  • CommitID: a86a87224d9c8e39b3c9a7cf258c8cbc672468da Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Oct 18 '23 23:10 github-actions[bot]

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/28460/
  • CommitID: 0eee577499dff7a6944f8dc68dbbc2fad90e8657 Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Oct 19 '23 17:10 github-actions[bot]

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/28471/
  • CommitID: b5060ec4c8312eefe7143a8bc6697b8db3adb77f Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Oct 19 '23 17:10 github-actions[bot]

Is this considered a flaky test?

  • What went wrong: Execution failed for task ':distribution:bwc:staged:buildBwcLinuxTar'. > Building 2.11.0 didn't generate expected file /var/jenkins/workspace/gradle-check/search/distribution/bwc/staged/build/bwc/checkout-2.11/distribution/archives/linux-tar/build/distributions/opensearch-min-2.11.0-SNAPSHOT-linux-x64.tar.gz

lukas-vlcek avatar Oct 19 '23 18:10 lukas-vlcek

Is this considered a flaky test?

No, please rebase :-)

reta avatar Oct 19 '23 19:10 reta

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/28829/
  • CommitID: 51f9672f3db9a85c3b50aef64c17dc54b0537fe1 Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Oct 23 '23 14:10 github-actions[bot]

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE :grey_exclamation:
  • TEST FAILURES:
      1 org.opensearch.index.ShardIndexingPressureSettingsIT.testShardIndexingPressureLastSuccessfulSettingsUpdate
  • URL: https://build.ci.opensearch.org/job/gradle-check/28861/
  • CommitID: 51f9672f3db9a85c3b50aef64c17dc54b0537fe1 Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

github-actions[bot] avatar Oct 23 '23 18:10 github-actions[bot]

Codecov Report

Attention: Patch coverage is 35.18519% with 105 lines in your changes are missing coverage. Please review.

Project coverage is 71.39%. Comparing base (bd8af6a) to head (a9f4e77). Report is 256 commits behind head on main.

Files Patch % Lines
...dmin/cluster/node/info/NodeAnalysisComponents.java 28.34% 89 Missing and 2 partials :warning:
...rg/opensearch/index/analysis/AnalysisRegistry.java 0.00% 5 Missing :warning:
...earch/action/admin/cluster/node/info/NodeInfo.java 42.85% 2 Missing and 2 partials :warning:
...ion/admin/cluster/node/info/NodesInfoResponse.java 0.00% 2 Missing :warning:
...src/main/java/org/opensearch/node/NodeService.java 0.00% 1 Missing and 1 partial :warning:
...ction/search/PutSearchPipelineTransportAction.java 0.00% 1 Missing :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #10296      +/-   ##
============================================
+ Coverage     71.30%   71.39%   +0.08%     
- Complexity    59877    59966      +89     
============================================
  Files          4980     4981       +1     
  Lines        282131   282287     +156     
  Branches      40937    40953      +16     
============================================
+ Hits         201184   201530     +346     
+ Misses        64228    64090     -138     
+ Partials      16719    16667      -52     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Oct 23 '23 18:10 codecov[bot]

@reta does this change look good to you now?

harshavamsi avatar Nov 02 '23 17:11 harshavamsi

@harshavamsi @reta Two last updates that I can see:

  1. One change I am currently implementing it that NodeAnalyzers object will not keep plugins in a Map but in a List. The reason is that Map needs a key while plugins are internally handled as a list all over the AnalysisModule (see its ctor). It is a tiny detail but I think it is worth implementing because there is no requirement for an Analysis plugins to use unique plugin name (which I was using as a key in the Map). By switching to a List I will get rid of the unique key "issue" (which means that the serialization of NodeAnalyzers needs to be changed).The only requirement is that individual Analyzers, Tokenizers, TokenFilters... etc are kept in a Map (ie. Registers) and they must use a unique name (otherwise NamedReqistry.register(String name, T t) will throw an exception). But AnalysisPlugin do not require unique name so I should not be keeping them in a Map keyed by the name.Edit: A little bit more context here: The only requirement for a plugin is that it needs to be in a folder, this imply that the only requirement for a plugin uniqueness is the installation folder name for which there are two options: a default one (I think it is based on a plugin name) and a custom folder (provided in plugin descriptor file). PluginService does not care about loading two plugins of the same class name or plugins with the same plugin name. BTW, test like this passes green (the same plugin loaded twice):
    // Add to PluginsServiceTests.java
    public void testPluginNameNotRequiredUnique() {
        Settings settings = Settings.builder().put(Environment.PATH_HOME_SETTING.getKey(), createTempDir()).build();
        PluginsService service = newPluginsService(settings, AdditionalSettingsPlugin1.class, AdditionalSettingsPlugin1.class);

        List<Tuple<PluginInfo, AdditionalSettingsPlugin1>> pluginInfos = service.filterPluginsForPluginInfo(
            AdditionalSettingsPlugin1.class
        );
        assertEquals(2, pluginInfos.size());
        assertEquals(pluginInfos.get(0).v1().getName(),pluginInfos.get(1).v1().getName());
    }

In other words I need to keep and serialize Analysis plugins in a List, not in a Map (which could theoretically conflict on the same key if plugins use the same class or name or whatever is used for a key ... and at higher level PluginInfo does not know about the name of its default installation folder name).

  1. Please see my comment on @reta review above.

lukas-vlcek avatar Nov 10 '23 14:11 lukas-vlcek