terraform-provider-newrelic icon indicating copy to clipboard operation
terraform-provider-newrelic copied to clipboard

NewRelic alert policies are recreated due to change in channel configuration

Open binoy351 opened this issue 3 years ago • 12 comments

When we modify and existing alert channel or add a new alert channel to an exisitng policy, all associated resources are force replaced. This results in force replacment of the policies where the open incidents are immediately closed and within minutes re-opened. This results in a lot of alert noise. Ideally, any change to the alert_channel should not replace the exisitng policies.

Terraform Version

Terraform v0.15.4
on linux_amd64

Affected Resource(s)

  • newrelic_alert_policy
  • newrelic_alert_channel

Terraform Configuration

terraform {
  required_version = "~> 0.15.4"
  required_providers {
    newrelic = {
      source  = "newrelic/newrelic"
      version = "~> 2.22.1"
    }
  }
}

Actual Behavior

NewRelic policies are recreated whenever there is a change to the associated channel. This results in all incidents being force closed and re-opened, resulting in alert noise.

  1. When an exsiting channel is modified all policies associated to that channel and force replaced
  2. When a new channel is added to a policy, the policy is force replaced.

This is observed in the latest version of newrelic provider as well (2.22.1)

Scenario 1: When an exisiting channel is modified

Channel name changed from test_channel to test_channel_1. Below is the output of terraform plan, where the name change is identified and policies where channel id 5106925 is referenced, is force replaced.

-/+ resource "newrelic_alert_channel" "test_channel" {
      ~ id   = "5106925" -> (known after apply)
      ~ name = "test_channel" -> "test_channel_1" # forces replacement
        # (1 unchanged attribute hidden)

      ~ config {
          - headers                 = (sensitive value)
          - payload                 = (sensitive value)
            # (2 unchanged attributes hidden)
        }
    }


-/+ resource "newrelic_alert_policy" "TEST_POLICY_1" {
      ~ account_id          = XXXXXXX -> (known after apply)
      ~ channel_ids         = [
          - 5106925,
          - 5110631,
        ] -> (known after apply) # forces replacement
      ~ id                  = "1341555" -> (known after apply)
        name                = "TEST_POLICY - High CPU Usage"
        # (1 unchanged attribute hidden)
    }


-/+ resource "newrelic_nrql_alert_condition" "high_cpu_usage_1" {
      - close_violations_on_expiration = false -> null
      - expiration_duration            = 0 -> null
      - fill_value                     = 0 -> null
      ~ id                             = "1341555:20625762" -> (known after apply)
        name                           = "CPU Usage and Load Average"
      - open_violation_on_expiration   = false -> null
      ~ policy_id                      = 1341555 -> (known after apply) # forces replacement
      ~ value_function                 = "SINGLE_VALUE" -> "single_value"
      + violation_time_limit           = (known after apply)
        # (6 unchanged attributes hidden)

      ~ critical {
          - duration              = 0 -> null
            # (4 unchanged attributes hidden)
        }

      ~ nrql {
            # (2 unchanged attributes hidden)
        }
    }

...
Plan: 5 to add, 1 to change, 5 to destroy.

Scenario 2: When a new channel is added to an existing policy.

When a channel is attached to an existing policy, the policy is forced replaced. (channel id 5106925 attached to the policy)

-/+ resource "newrelic_alert_policy" "PD_test" {
      ~ account_id          = XXXXXX -> (known after apply)
      ~ channel_ids         = [ # forces replacement
            5108200,
          + 5106925,
        ]
      ~ id                  = "1341554" -> (known after apply)
        name                = "PD_test"
        # (1 unchanged attribute hidden)
    }


-/+ resource "newrelic_nrql_alert_condition" "pd_test_disk_usage" {
      - close_violations_on_expiration = false -> null
      - expiration_duration            = 0 -> null
      - fill_option                    = "none" -> null
      - fill_value                     = 0 -> null
      ~ id                             = "1341554:20625761" -> (known after apply)
        name                           = "Disk Usage"
      - open_violation_on_expiration   = false -> null
      ~ policy_id                      = 1341554 -> (known after apply) # forces replacement
      ~ value_function                 = "SINGLE_VALUE" -> "single_value"
      + violation_time_limit           = (known after apply)
        # (6 unchanged attributes hidden)

      ~ critical {
          - duration              = 0 -> null
          ~ threshold_duration    = 60 -> 300
            # (3 unchanged attributes hidden)
        }

      ~ nrql {
            # (2 unchanged attributes hidden)
        }
    }

Plan: 2 to add, 1 to change, 2 to destroy.

Expected Behavior

  1. Any change to the alert channel should not re-create the policy.
  2. Addition of new alert channels to an exisiting policy, should not recreate the policy.

References

https://github.com/newrelic/terraform-provider-newrelic/issues/1130

binoy351 avatar May 29 '21 03:05 binoy351

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed if no further activity occurs.

stale[bot] avatar Jun 16 '21 20:06 stale[bot]

Observing the same behaviour on the latest available version.

It doesn't come up often but if we do need to amend the payload format of our webhook notification channel, this cascades down to destroy and re-create upwards of a thousand alert conditions that are associated. Far from ideal, to say the least.

rdhar avatar Jun 17 '21 15:06 rdhar

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed if no further activity occurs.

stale[bot] avatar Jul 01 '21 18:07 stale[bot]

This issue has been automatically closed due to a lack of activity for an extended period of time.

stale[bot] avatar Jul 09 '21 05:07 stale[bot]

This issue is still present.

nikopavlica avatar Mar 23 '22 07:03 nikopavlica

Thanks for letting us know @nikopavlica Will take a look.

kidk avatar Mar 23 '22 10:03 kidk

I'm hitting a similar issue. It does not result in forced replacement of notification channels, and instead gets ignored (unless changes are made to the notification channel) but it does show up in warnings. To add more detail, running a plan with no changes to notification channels I get the following:

Note: Objects have changed outside of Terraform

Terraform detected the following changes made outside of Terraform since the
last "terraform apply":

  # newrelic_alert_channel.slack-my-channel has changed
  ~ resource "newrelic_alert_channel" "slack-my-channel" {
        id   = "xxxxxx"
        name = "my-channel"
        # (1 unchanged attribute hidden)

      ~ config {
          + headers = (sensitive value)
          + payload = (sensitive value)
            # (2 unchanged attributes hidden)
        }
    }

  # newrelic_alert_channel.vo-my-routingkey has changed
  ~ resource "newrelic_alert_channel" "vo-my-routingkey" {
        id   = "xxxxxx"
        name = "my-routingkey"
        # (1 unchanged attribute hidden)

      ~ config {
          + headers   = (sensitive value)
          + payload   = (sensitive value)
            # (2 unchanged attributes hidden)
        }
    }

It affects both Slack and VictorOps/Splunk-on-call notification channels. I'd also add that these fields are not present in the GraphQL API

{
  "errors": [
    {
      "locations": [
        {
          "column": 15,
          "line": 11
        }
      ],
      "message": "Cannot query field \"headers\" on type \"AlertsSlackNotificationChannelConfig\"."
    },
    {
      "locations": [
        {
          "column": 15,
          "line": 12
        }
      ],
      "message": "Cannot query field \"payload\" on type \"AlertsSlackNotificationChannelConfig\"."
    }
  ]
}

danielgblanco avatar Mar 31 '22 09:03 danielgblanco

This is currently the documented expected behavior when changing the channels associated with a Policy: https://registry.terraform.io/providers/newrelic/newrelic/latest/docs/resources/alert_policy#channel_ids

Updating the channel itself also requires recreating the channel, which then changes the channels on the Policy and forces a new Policy to be created.

emetcalf9 avatar Mar 31 '22 23:03 emetcalf9

This is still a problem on 3.1.0:

terraform {
  required_providers {
    newrelic = {
      source  = "newrelic/newrelic"
      version = "=3.1.0"
    }
  }
}

provider "newrelic" {
  account_id = MY_ACCOUNT_ID_HERE
}

resource "newrelic_alert_policy" "policy" {
  incident_preference = "PER_CONDITION_AND_TARGET"
  name                = "MY_APP"

  channel_ids = var.nr_channels
}

On every terraform plan:

  # module.app.newrelic_alert_policy.policy must be replaced
-/+ resource "newrelic_alert_policy" "policy" {
      - account_id          = MY_ACCOUNT_ID_HERE -> null # forces replacement
      ~ id                  = "SOMENUMBER" -> (known after apply)
        name                = "MY_APP"
        # (2 unchanged attributes hidden)
    }

Happens with or without the newrelic config block (eg relying on NEW_RELIC_XXX vars for credentials and account id).

luispabon avatar Sep 02 '22 17:09 luispabon

I had this issue too: Try specifying account_id in your alert_policy - it is optional, but if you don't specify it, it's not stred in the state, (or somthing like that). Works for me now...

darren-steven avatar Sep 07 '22 05:09 darren-steven

That's what I had to do, percolate the account id from elsewhere directly into the resource; unfortunately none of the data sources allow you to query the account id for the account you're currently authenticated on.

luispabon avatar Sep 07 '22 08:09 luispabon

We'll take another look at this. Thanks for the report!

kidk avatar Sep 07 '22 09:09 kidk

Is this issue still persists in the latest version of provider?

NSSPKrishna avatar Nov 02 '22 11:11 NSSPKrishna

We haven’t heard back from you in a long time so we will close the ticket. If you feel this is still a valid request or bug, feel free to create a new issue.

NSSPKrishna avatar Nov 17 '22 12:11 NSSPKrishna