terraform-provider-elasticstack
terraform-provider-elasticstack copied to clipboard
[Bug] Importing/Creating alerting rules in cluster management not working due to notify_when parameter
Describe the bug
I'm importing alert rules and the alert rule, as created in the GUI, does not have notify_when set. The rule is configured with a filter query. notify_when is a required parameter in the Terraform resource. When I set notify_when in my terraform config and redeploy, it breaks the alerting rule in Kibana, as it no longer honors the filter rule and constantly sends me alerts. I also, can no longer modify the rule in Kibana, as it gives me an internal server error message.
The same issue occurs when I initiate rule creation directly from Terraform. It creates the rule successfully, but then any subsequent update tries to repopulate the notify_when parameter. The rule is broken on creation.
To Reproduce Steps to reproduce the behavior:
- TF configuration used '...'
resource "elasticstack_kibana_alerting_rule" "main" {
name = var.name
consumer = var.consumer
notify_when = var.notify_when
params = local.params
rule_type_id = var.rule_type_id
interval = var.interval
enabled = var.enabled
throttle = var.throttle
space_id = var.space_id
dynamic actions {
for_each = local.actions != null ? local.actions : []
content {
id = actions.value.id
params = actions.value.params
group = actions.value.group
}
}
}
vars:
name: Disk Usage
consumer: monitoring
notify_when: onThrottleInterval
rule_type_id: monitoring_alert_disk_usage
interval: 1m
params: |-
threshold: 80
duration: 5m
filterQueryText: NOT elasticsearch.node.roles:"data_frozen"
filterQuery: |-
{"bool":{"must_not":{"bool":{"should":[{"term":{"elasticsearch.node.roles":{"value":"data_frozen"}}}],"minimum_should_match":1}
actions:
- group: default
connector_name: "Monitoring: Write to Kibana log"
params: |-
message: "{{context.internalShortMessage}}"
level: info
- group: default
connector_name: Elastic-Cloud-SMTP
params: |-
message: "{{context.internalFullMessage}}"
to:
- [email protected]
subject: Elastic - High Disk Usage
- TF operations to execute to get the error '...' [e.g
terraform plan,terraform apply,terraform destroy]
terraform import module.monitor_alerting_rule[\"default\;disk-usage\"].elasticstack_kibana_alerting_rule.main default/3a3d1950-84af-11ee-bb3d-a97312b6e99c
terraform plan
- See the error in the output '...' The policy before import:
{
"id": "3a3d1950-84af-11ee-bb3d-a97312b6e99c",
"name": "Disk Usage",
"tags": [],
"enabled": true,
"consumer": "monitoring",
"throttle": null,
"revision": 18,
"running": false,
"schedule": {
"interval": "1m"
},
"params": {
"duration": "5m",
"filterQuery": "",
"filterQueryText": "",
"threshold": 80
},
"rule_type_id": "monitoring_alert_disk_usage",
"created_by": "1685651753",
"updated_by": "1685651753",
"created_at": "2023-11-16T18:37:43.710Z",
"updated_at": "2023-12-07T22:36:34.990Z",
"api_key_owner": "1685651753",
"notify_when": null,
"mute_all": false,
"muted_alert_ids": [],
"scheduled_task_id": "3a3d1950-84af-11ee-bb3d-a97312b6e99c",
"execution_status": {
"status": "active",
"last_execution_date": "2023-12-07T22:36:41.358Z",
"last_duration": 736
},
"actions": [
{
"group": "default",
"id": "39ae46d0-84af-11ee-bb3d-a97312b6e99c",
"params": {
"level": "info",
"message": "{{context.internalShortMessage}}"
},
"connector_type_id": ".server-log",
"frequency": {
"summary": false,
"notify_when": "onActionGroupChange",
"throttle": null
},
"uuid": "697a281e-dd62-478a-8ad8-8ccb9a3b2dea"
},
{
"group": "default",
"id": "elastic-cloud-email",
"params": {
"message": "{{context.internalFullMessage}}",
"to": [
"XXXX"
],
"subject": "Elastic - High Disk Usage"
},
"connector_type_id": ".email",
"frequency": {
"summary": false,
"notify_when": "onActionGroupChange",
"throttle": null
},
"uuid": "ee56cb2c-7a30-4750-aaa5-de9a17233086"
}
],
"last_run": {
"alerts_count": {
"active": 2,
"new": 2,
"recovered": 0,
"ignored": 0
},
"outcome_msg": null,
"outcome_order": 0,
"outcome": "succeeded",
"warning": null
},
"next_run": "2023-12-07T22:37:41.334Z",
"api_key_created_by_user": false
}
The Terraform Changeset after import:
Terraform will perform the following actions:
# module.monitor_alerting_rule["default;disk-usage"].elasticstack_kibana_alerting_rule.main will be updated in-place
~ resource "elasticstack_kibana_alerting_rule" "main" {
id = "default/3a3d1950-84af-11ee-bb3d-a97312b6e99c"
name = "Disk Usage"
+ notify_when = "onThrottleInterval"
tags = []
# (10 unchanged attributes hidden)
# (2 unchanged blocks hidden)
}
Plan: 0 to add, 1 to change, 0 to destroy.
The object after redeploying with Terraform:
{
"id": "3a3d1950-84af-11ee-bb3d-a97312b6e99c",
"name": "Disk Usage",
"tags": [],
"enabled": true,
"consumer": "monitoring",
"throttle": null,
"revision": 19,
"running": false,
"schedule": {
"interval": "1m"
},
"params": {
"duration": "5m",
"filterQuery": "",
"filterQueryText": "",
"threshold": 80
},
"rule_type_id": "monitoring_alert_disk_usage",
"created_by": "1685651753",
"updated_by": "elastic",
"created_at": "2023-11-16T18:37:43.710Z",
"updated_at": "2023-12-07T22:45:34.655Z",
"api_key_owner": "elastic",
"notify_when": "onThrottleInterval",
"mute_all": false,
"muted_alert_ids": [],
"scheduled_task_id": "3a3d1950-84af-11ee-bb3d-a97312b6e99c",
"execution_status": {
"status": "active",
"last_execution_date": "2023-12-07T22:44:51.497Z",
"last_duration": 1189
},
"actions": [
{
"group": "default",
"id": "39ae46d0-84af-11ee-bb3d-a97312b6e99c",
"params": {
"level": "info",
"message": "{{context.internalShortMessage}}"
},
"connector_type_id": ".server-log",
"uuid": "b7522351-9cd9-4a80-a921-48f42dc4b169"
},
{
"group": "default",
"id": "elastic-cloud-email",
"params": {
"message": "{{context.internalFullMessage}}",
"subject": "Elastic - High Disk Usage",
"to": [
"XXXX"
]
},
"connector_type_id": ".email",
"uuid": "2c592cc7-9e7a-4209-a57b-70959f0776d9"
}
],
"last_run": {
"alerts_count": {
"active": 2,
"new": 0,
"recovered": 0,
"ignored": 0
},
"outcome_msg": null,
"outcome_order": 0,
"outcome": "succeeded",
"warning": null
},
"next_run": "2023-12-07T22:45:51.426Z",
"api_key_created_by_user": false
}
Note now that the action item frequency params are no longer there.
Expected behavior Import or create the policy to produce a working alerting rule
Screenshots
The kibana error when attempting to save the rule via the kibana dashboard after modification to the rule:
Versions (please complete the following information):
- OS: Linux
- Terraform Version 1.3.9
- Provider version : v0.11.0
- Elasticsearch Version 8.11.1
Additional context I tried to address this with my limited go knowledge but was unsuccessful. I did get as far as just setting the value for notify_when to null to see what it would do. Elasticsearch then complained that the rule actions were missing their frequency parameters.
Cross-posting @jpdjere's comment on previous discussion.
For Alerting-level Rule updates, since Kibana's ResponseOps team finalized https://github.com/elastic/kibana/issues/143368 via https://github.com/elastic/kibana/pull/144130 the Rule-level notify_when column is no longer required as the data can be populated under the action.frequency request JSON. (Noting that IF Rule-level is included it omits the Action-level regardless of presence, but if the Action-level is included and Rule-level not, the Kibana API no longer errors in more recent version. Terraform's code should be updated to reflect this new reality ≥v8.7.0.)
As a kind of meta-question on this, it's not clear how this is tested, but feels like we should do some kind of E2E testing at least once per release (first bc?) to make sure API changes have not broken this provider.
If there is already some kind of test like this, we apparently need more, as we should have been able to catch this.
If there isn't, we should build one.
There's acceptance tests for this resource which are run against a range of stack versions.
The existing tests do explicitly check the notify_when property. IIUC this issue correctly, they're passing since they don't configure an action on the alerting rule.
@tobio IIUC this is purely a TF issue now? (initially looked like a Kibana API issue?)
@tobio IIUC this is purely a TF issue now? (initially looked like a Kibana API issue?)
No, it wasn't. After the upgrade to the version, it's working now. Thanks
IIRC the API spec has changed dramatically since these resources were introduced. We likely want to regenerate the client and decide on how much effort we put into supporting early API version here too.
@tobio so the path here is basically creating a new resource from the newer sec IIUC?
I think it's regenerating the client and making the current resource work with the latest version. There's potentially some version restrictions tie figure out (e.g ES 8.8- requires the current provider version), but we'd have to look at the spec changes to know.
Thanks for looking into this topic. @tobio Did you maybe managed to regenerate the client?
I'm really looking forward to use this resource with latest Elastic Cloud version.
@tobio Did you maybe have time to take look into this issue?
I've only been able to look at this to verify that it's not simply an hours work and that there's a bunch of breaking changes in the API spec we'd need to adapt to sadly.
Fixed by https://github.com/elastic/kibana/issues/186963. Starting from v0.11.7, the Rule resource now supports the rule's alert_delay property and the rule's action alerts_filter and frequency properties. You can find complete documentation on the Elastic Terraform provider documentation page.