alerting icon indicating copy to clipboard operation
alerting copied to clipboard

[FEATURE] Cross-cluster monitors

Open AWSHurneyt opened this issue 2 years ago β€’ 6 comments
trafficstars

Problem

Currently, the Alerting plugin's Bucket, Document, and Query monitors are only able to query data sources, (e.g., indexes) on the local cluster. While this can accommodate many use cases, we can see from the following issues that users have the need to query data sources across multiple clusters.

  1. https://github.com/opendistro-for-elasticsearch/alerting/issues/62
  2. https://github.com/opensearch-project/alerting/issues/207
  3. https://github.com/opensearch-project/alerting-dashboards-plugin/issues/570

Cross-cluster monitors (CCM)

OpenSearch Core, and the Security plugin currently support cross-cluster search out of the box (source). By leveraging this existing feature, we can support monitors that perform search queries to remote clusters. CCM would allow users to configure their monitors on a single cluster which could then become a HUB for managing all of their Alerting plugin alerts.

Integration with this existing feature would also ensure that access control for data sources is preserved across all clusters queried by the monitors. Users that do not have permissions to query an index on clusterA will not be able to create a monitor on clusterB to make remote calls to that index.

Use cases

  1. Users with large amounts of data to store may choose to store that data across multiple clusters consisting of many nodes.
  2. Users may choose to store different types of data on different clusters. E.g., storing pending order data on cluster-1, and completed order data on cluster-2.
  3. Users may choose to host separate clusters in other regions as opposed to hosting individual nodes in those regions.

In scope

[P0]

  1. Users will be able to configure per query, per bucket, per document, and composite monitors that are capable or making search calls to remote clusters.
  2. The monitor creation page in the Alerting plugin UI will be updated to prompt the user to select the clusters from which they want to select indexes to monitor.
  3. Alerting plugin pages that display monitor details, and UI elements that show alert details will be revamped to display the indexes queries by a monitor, and on which cluster those indexes are stored.
  4. This new feature will adhere to the current access control of the Security plugin.
    1. E.g., users that do not have permissions to query an index on clusterA will not be able to create a monitor on clusterB to make remote calls to that index.

[P1]

  1. Users will have the option to configure a cross-cluster monitor to either perform a cross-cluster search call from the local cluster to a remote cluster (supported in P0), or replicate the monitor on the remote clusters.
    1. Replicating the monitor on remote clusters will allow the monitor to execute on each of the clusters separately, and then compile then return only the execution results to the original monitor.
    2. The payload of the execution results can be much smaller than an entire search response, which would allow some monitor configurations to execute more efficiently.

Out of scope

[P0]

  1. Cluster metrics monitors will not be supported by this feature. This monitor type does not query an index, but instead make calls to API, such as /_cluster/health and /_cat/indices, against the local cluster. Because it does not query indexes, this monitor type cannot make use of cross-cluster search for its execution.
  2. Users can only acknowledge alerts generated by a cross-cluster monitor on the local cluster.
  3. The Alerting plugin UI will only display alerts and monitor details on the local cluster (i.e., the cluster on which the monitor was created).
  4. Cross-cluster composite monitors can only be configured using monitors from the local cluster.

Things to consider

  1. In order for a user to create a monitor that can successfully make search calls to a remote cluster, β€œboth clusters must have the user, but only the remote cluster needs the role and mapping.” (source)
  2. Customers may be configuring monitors that call remote clusters in different availability zones. Latency in request/response transmission could cause cross-cluster monitors to take a longer than expected amount of time to execute. There could be a buildup of pending tasks if the execution frequency of a monitor is shorter than the amount of time it takes for the monitor to complete execution.

AWSHurneyt avatar Jul 06 '23 01:07 AWSHurneyt

  • What is the scope here - will AD plugin and security analytics plugin also become cross-cluster compliant if we add support in Alerting plugin?
  • What about cluster metrics monitor type? can we monitor metrics of remote cluster?

eirsep avatar Jul 06 '23 01:07 eirsep

Integrating the Alerting plugin with the SQL plugin would allow monitors to utilize that existing framework to query remote clusters during execution.

Why do we need to integrate with SQL plugin? Can we avoid that dependency?

eirsep avatar Jul 06 '23 17:07 eirsep

Integrating the Alerting plugin with the SQL plugin would allow monitors to utilize that existing framework to query remote clusters during execution.

Why do we need to integrate with SQL plugin? Can we avoid that dependency?

I agree, we should separate out SQL integration from this scope since that is not focused on cross cluster search support and we want to support it at the core Alerting level and not through SQL only.

lezzago avatar Jul 26 '23 16:07 lezzago

@eirsep @lezzago Thank you for the feedback! I've added In scope and Out of scope sections, and I've removed the points relating to the SQL plugin.

AWSHurneyt avatar Aug 01 '23 15:08 AWSHurneyt

Can we have some priority on this issue please? We have a cross cluster search and alerting is broken and not working.. Thank you in advanced..

bjoshi18 avatar Oct 27 '23 17:10 bjoshi18

Hello @bjoshi18! We're targeting v2.13 for this feature.

AWSHurneyt avatar Jan 30 '24 17:01 AWSHurneyt