opensearch-sdk-java
opensearch-sdk-java copied to clipboard
[FEATURE] Replace calls to SDKClusterService state() with more targeted calls
Is your feature request related to a problem?
In the AD plugin, the process of creating a detector makes frequent use of the ClusterService.state() method, to:
- check whether index metadata exists
- check after creating metadata that it exists
- check whether an index exists when validating whether a detector would create a new index
- check whether an alias exists when validating, in the line of code immediately after the above (uh, variable re-use anyone?)
- check whether the index exists (again) right before creating it
- check whether the alias exists (again) right before creating it in the line of code just after the above
- check whether the cluster is write blocked before writing to it
- check whether the index exists after creating it
There are several more usages associated with Job Scheduler.
In OpenSearch, these repeated calls aren't a performance issue because the methods simply return the same (up to date) object every time.
In the SDK, every single request sends a transport call and since it's unfiltered, returns the entire cluster state of OpenSearch over transport, even if we only need one bit of information.
This is like turning on a fire hose to get a sip of water.
Eight times.
What solution would you like?
Don't use SDKClusterService.state() unless you really really really mean it. And you probably don't.
We should audit our usage of state in AD (as in the first comment) and update the migration guide to identify use cases and provide suggested alternatives.
We may want to just consider marking the SDKClusterService state() method as deprecated to give a clear signal to developers that this is not the droid they are looking for.
What alternatives have you considered?
Status quo: fetch the cluster state every time
- Pro: it's already done. And has some benefit in finding bugs
- Con: we always wonder why our Postman REST requests time out
Maintain a local state, similar to the way we do for Settings, with update consumers:
- Pro: we have a local state always ready to provide fast access to the entire state of the OpenSearch cluster!
- Con: we don't need 99.9% of the information in the cluster, and it's updated frequently.
Do you have any additional context?
Hat tip to @dblock for pointing out the obvious in this comment. Reading all the comments on that PR is also educational.
This issue will address the performance for AD Extension as well. With plugin architecture it makes sense to get the cluster state as they a local copy and gets notified only if the state change. Also, plugin eventually are part of OpenSearch (So..). Few points I would like to reiterate if we are trying to keep ClusterState as just an internal API and not expose it:
- Identify the actual use case of ClusterState in AD Extension, if it's just for checking the index exist we can use the existing APIs for such cases.
- Maintain a local copy of the cluster state. But again, do we want a remote node to store a local copy of the cluster state and update it whenever there are any changes? (Need more discussion on this one)
IMO, if we are able to find all the alternative APIs for the usage of ClusterState then moving with the 1st option makes sense.
in AD Extension, if it's just for checking the index exist we can use the existing APIs for such cases.
Other common case I see is checking for blocks (e.g., ClusterBlockLevel.WRITE). I'm sure there's a faster way of checking that. (On Indices API.)
It's possible the ClusterService call gives information on system indices not available via the Index API, so there may be a need for some middle ground not currently covered by the APIs.
Needed methods:
String[] concreteIndices = indexNameExpressionResolver
.concreteIndexNames(
sdkClusterService.state(),
IndicesOptions.lenientExpandOpen(),
sourceIndices.toArray(new String[0])
);
state().metadata().hasIndex(indexName)state().metadata().index(resultIndex)state().metadata().hasAlias(alias)state().metadata().indices().containsKey(key)state().metadata().getIndicesLookup().get(indexOrAliasName).getIndices()state().getNodes().isLocalNodeElectedMaster()state().getMetadata().indices().get(concreteIndex)state().getRoutingTable().hasIndex(indexName)state().getRoutingTable().index(indexOrAliasName)state().nodes()state().blocks().globalBlockedException(level)state().getClusterName()
More research notes:
- The getters have two forms (
getMetadata()delegates tometadata(),getNodes()tonodes(),getRoutingTable()toroutingTable()) hasIndex(name)is equivalent to testing whetherindex(name)is not null and has matching UUID- There are REST requests for many of these
Did some digging into the different ways AD Extension is using the cluster state to determine if an index exists, currently there are three :
state().getRoutingTable().hasIndex(indexName)
Used primarily in
AnomalyDetectionIndicesto determine if system indices exist in the cluster before creating it. For example
state().metadata().hasIndex(indexName)
Used only in
AnomalyDetectionIndiceshere to determine if the custom result index created by the user exists in the cluster. This is essentially the same as [3] since internally it just callsindices().containsKey(indexName)
state().metadata().indices().containsKey(indexName)
Used to determine if the Anomaly Detector Job Index / Anomaly Detector Index has been created prior to querying for entries within those indices. For example.
From this exploration, it seems that there is no difference in using the metadata or routingtable of the cluster state to determine if an index exists (system index or otherwise). We can replace all instances of these three use cases of the cluster state with the indices exists API documented here
Did some digging into the different ways AD Extension is using the cluster state to determine if an index exists
I'm curious if that differentiates between index and alias? i.e., if an index exists and has an alias, does index(alias) return the index? I think that meets our needs as the index/alias checks in AD code are always back-to-back and can be reduced to one.
Other APIs looks pretty simple...
- cluster name can come from
GET _cluster/healthorGET _cluster/stats(the former is probably faster) - nodes is probably
GET /_nodes(could look at usages to filter by type but probably not a big deal) - blocks is in cluster settings (
cluster.blocks.read_only, etc.)
I'm curious if that differentiates between index and alias? i.e., if an index exists and has an alias, does index(alias) return the index?
I don't think this is the case, as the metadata class also has an hasAlias here to see if it exists
Next set of cluster state calls that I shall handle focuses on retrieving index level metadata/routing tables :
The following both return the IndexMetaData of the given index, as @dbwiddis mentioned, getMetaData() just delegates to metadata() and index() just calls indices().get(indexName). Both calls are effectively the same.
state().metadata().index(indexName)
Used in
AnomalyDetectionIndiceshere to validate if the user-created custom anomaly result index is valid. Internally this just retrieves the index mapping so this call can be replaced with theGetIndexAPI documented here which contains the mapping (if any) of an index as part of the response.
Additionally used in
IndexUtilshere to create an object ofClusterIndexHealthwhich utilizes theIndexMetaDatato retrieve the index name, number of shards and number of replicas. Ultimately just used to retrieve the health status code of the index
state().getMetadata().indices().get(concreteIndex)
Used in
AnomalyDetectionIndiceshere to determine if the given index should be updated. This also just retrieves the index mapping from theIndexMetaData, which could be replaced with theGetIndexAPI
The following returns a list of IndexMetaData given an alias :
state().metadata().getIndicesLookup().get(indexOrAliasName).getIndices()
This call is used in
IndexUtilshere to determine the index the alias actually points to (if any, unless there are none or more than 1) and identifies this index as the index to determine cluster index health from
The following call returns the RoutingTable of the given index
state().getRoutingTable().index(indexOrAliasName):
Used in
IndexUtilsto create an object ofClusterIndexHealth, internally used to retrieve all theIndexShardRoutingTablesassociated with this index. Ultimately just used to retrieve the health status code of the index
TLDR : Can replace calls to the cluster state to retrieve IndexMetaData with the Get Index API to retrieve index mappings. May need to add support to the SDKClient to retrieve IndexMetadata and IndexRoutingTable from a given index for use in determining ClusterIndexHealth, if there is no API that already exists to achieve the same. (Edit, We can retrieve Cluster Index Health status from ClusterHealth API )
Moving onto the following cluster state calls :
state().getClusterName()
this call is primarily used as part of the
TransportNodesActionresponse to identify the cluster that the response is coming from. In the case of AD Extension, all theTransportNodeActionrequests are directed back to the extension node so the cluster name will always be the same.
@dbwiddis rather than retrieving the cluster name from another call to the ClusterHealth API, we can retrieve the cluster name from the environment settings response after extension initialization as the cluster name is part of this :
{
"client.type": "node",
"cluster.name": "opensearch",
"http.type.default": "netty4",
"node.attr.shard_indexing_pressure_enabled": "true",
"node.name": "dev-dsk-jpalis-2c-27c8aa11.us-west-2.amazon.com",
"opensearch.experimental.feature.extensions.enabled": "true",
"path.home": "/home/jpalis/repos/integ/OpenSearch/distribution/archives/linux-tar/build/install/opensearch-3.0.0-SNAPSHOT",
"path.logs": "/home/jpalis/repos/integ/OpenSearch/distribution/archives/linux-tar/build/install/opensearch-3.0.0-SNAPSHOT/logs",
"transport.type.default": "netty4"
}