Make number of triggers per monitor configurable
Hello everyone,
As mentioned on the documentation:
The cluster metrics monitor supports up to ten triggers.
Let me describe my use-case. We have a considerable amount of OpenSearch clusters (~200). To monitor them, we perfmon requests like GET _cluster/health, GET _nodes/stats, etc. sequentially to all clusters (e.g. every 5 minutes) and store the results into different indices in a dedicated cluster that we use for monitoring. There, we want to create alerts for the different things that can go wrong in our clusters.
So, targeting the index that holds data of GET _nodes/stats calls, and in order to alert us on clusters getting full, we create a monitor with a painless script ("100.0 * doc['free_in_bytes'].value / doc['total_in_bytes'].value < 15") that returns the clusters and the corresponding nodes that are getting full, in the different buckets:
"buckets": [
{
"doc_count": 36,
"total": {
"value": 3702600499200
},
"nodes": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"doc_count": 12,
"total": {
"value": 3702600499200
},
"free": {
"value": 522902073344
},
"key": "esacluster101_data1"
},
{
"doc_count": 12,
"total": {
"value": 3702600499200
},
"free": {
"value": 513142358016
},
"key": "esbcluster101_data1"
},
{
"doc_count": 12,
"total": {
"value": 3702600499200
},
"free": {
"value": 308051558400
},
"key": "esdcluster101_data1"
}
]
},
"free": {
"value": 308051558400
},
"key": "cluster101"
},
...
Now, we want to create 1 trigger for each of the cluster, in order to send the notification on the respective destination. Here's the trigger condition (the name of the trigger is the cluster name -e.g., 'cluster101'):
boolean full= false;
for (int i=0; i < ctx.results[0].aggregations.clusters.buckets.length; i++) {
if (ctx.results[0].aggregations.clusters.buckets[i].key == ctx.trigger.name) {
ctx.results[0].full_nodes =ctx.results[0].aggregations.clusters.buckets[i].nodes.buckets;
full = true;
}
}
return full;
The above process works fine, the only problem is that I can only create 10 triggers for the monitor. Instead of the 200 triggers that I want, I could create 200 monitors, but that sounds even worse to me.
Thus, unless there is an important reason to limit number of triggers allowed in a monitor, I would suggest to make this setting configurable (still having 10 as default value). Let me know what you think.
This makes sense to me unless others have strong opinion against it @lezzago @qreshi ? @spapadop if you're comfortable, feel free to contribute once we decide for it
The changes goes here - https://github.com/opensearch-project/alerting/blob/a3e9b5eeeb7c81ccb72db32cc0ca99a7ea9c1b9a/alerting/src/main/kotlin/org/opensearch/alerting/settings/AlertingSettings.kt#L21 and https://github.com/opensearch-project/alerting/blob/2392be74b993c4421c8cb84db40a23c007602c8d/alerting/src/main/kotlin/org/opensearch/alerting/model/Monitor.kt#L86
I don't have any issues with making this limit dynamic.
A few things to consider when making this change:
- The check within the
Monitordata class is static for now and is done in theinit. If we make the value fetched from a setting which can be dynamic, we might have to do it a little differently since we don't typically add setting update consumers and have access to theSettingshere (it could possibly be a validation method in the form of an extension that is executed in the transport action that creates the Monitors) - We'll want to check the Alerting Dashboards plugin as well since I believe there was some info shown to the user when they'd hit the Trigger limit, we'll want to make that validation/messaging dynamic based on the backend setting
Thanks for the replies and input. Indeed the Alerting Dashboards plugin outputs the same error message when reaching the limit. I could have a look at it on that side too and synchronize the change if needed. If not urgent for you, I'd gladly contribute that, even though it will take some time as I'm leaving for long holidays today :slightly_smiling_face:
I do not have issues this making this limit dynamic either. @spapadop that is fine, feel free to contribute after your holidays.
One thing to notes when creating the new setting is to have a validator and that can be done similarly like this
Hi @lezzago,
I was about to look into this one, and I've noticed that in 2.4 (released a couple of days ago), this change seems to be already there.
Do you think that this issue should be closed?
Hi @lezzago,
I was about to look into this one, and I've noticed that in 2.4 (released a couple of days ago), this change seems to be already there.
Do you think that this issue should be closed?
No, that configuration is to configure the max number of monitors. Not number of triggers per monitor.
@brijos, should we make sure to prioritize this in our backlog as this is a minor change?
Hi again @lezzago . If I'm not mistaken, this didn't make it to 2.4.1. Do you think that 2.4.2 will have it? Thanks again,
Bumping this, I have a similar use case that could benefit from consolidating multiple monitors by combining their triggers into a single monitor. Will appreciate any updates.