terraform-provider-pagerduty icon indicating copy to clipboard operation
terraform-provider-pagerduty copied to clipboard

pagerduty_escalation_policy - rule targets are constantly on drift

Open pacoguzman opened this issue 2 years ago • 4 comments

Terraform Version

Terraform v1.3.6

Affected Resource(s)

  • pagerduty_escalation_policy

In our use case, this is affecting only that resource.

Terraform Configuration Files

# Configure the PagerDuty provider
terraform {
  required_providers {
    pagerduty = {
      source  = "pagerduty/pagerduty"
      version = "2.9.3"
    }
  }
}

# Requires PAGERDUTY_TOKEN env var.
provider "pagerduty" {}

variable "users" {
  description = "Users SSOT"
  type = map(object({
    name  = string
    email = string
    role  = string
  }))
  default = {
    user1 = {
      name  = "User1"
      email = "[email protected]"
      role  = "user"
    }
  }
}

variable "teams" {
  description = "Teams SSOT"
  type = map(object({
    name = string
  }))
  default = {
    team1 = {
      name = "Team 1"
    }
    platform = {
      name = "platform"
    }
  }
}

variable "platform_team_id" {
  description = "Platform team ID"
  type        = string
  default     = "platform"
}

# Used as placeholder.
resource "pagerduty_user" "pepe" {
  name  = "Pepe"
  email = "[email protected]"
}

# L1 schedule for each team.
resource "pagerduty_schedule" "pagerduty_schedule_l1" {
  for_each = var.teams

  name        = "${each.key} - L1"
  description = "L1 oncall schedule for ${each.value.name} team"
  time_zone   = "Etc/UTC"

  # We don't want to control the layer of the oncall, just that the
  # schedule exists.
  lifecycle {
    ignore_changes = [
      layer,
    ]
  }

  layer {
    name                         = each.key
    start                        = "2022-01-01T00:00:00Z"
    rotation_virtual_start       = "2022-01-01T00:00:00Z"
    rotation_turn_length_seconds = 604800 # 7 days
    users                        = [pagerduty_user.pepe.id]
  }
}

# L2 schedule for each team.
resource "pagerduty_schedule" "pagerduty_schedule_l2" {
  for_each = var.teams

  name        = "${each.key} - L2"
  description = "L2 oncall schedule for ${each.value.name} team"
  time_zone   = "Etc/UTC"

  # We don't want to control the layer of the oncall, just that the
  # schedule exists.
  lifecycle {
    ignore_changes = [
      layer,
    ]
  }

  layer {
    name                         = each.key
    start                        = "2022-01-01T00:00:00Z"
    rotation_virtual_start       = "2022-01-01T00:00:00Z"
    rotation_turn_length_seconds = 604800 # 7 days
    users                        = [pagerduty_user.pepe.id]
  }
}

# Shadow schedule for each team.
resource "pagerduty_schedule" "pagerduty_schedule_shadow" {
  for_each = var.teams

  name        = "${each.key} - Shadow"
  description = "Shadow oncall schedule for ${each.value.name} team"
  time_zone   = "Etc/UTC"

  # We don't want to control the layer of the oncall, just that the
  # schedule exists.
  lifecycle {
    ignore_changes = [
      layer,
    ]
  }

  layer {
    name                         = each.key
    start                        = "2022-01-01T00:00:00Z"
    rotation_virtual_start       = "2022-01-01T00:00:00Z"
    rotation_turn_length_seconds = 604800 # 7 days
    users                        = [pagerduty_user.pepe.id]
  }
}

# Escalation policy for each team with an schema of:
#  Page
#  └── Team L1 + Team shadow
#      └─ Team L2 + Platform L1
#
resource "pagerduty_escalation_policy" "pagerduty_escalation_policy" {
  for_each = var.teams

  name        = each.value.name
  description = "Escalation policy for ${each.value.name} team"
  num_loops   = 5

  rule {
    escalation_delay_in_minutes = 10

    target {
      type = "schedule_reference"
      id   = pagerduty_schedule.pagerduty_schedule_l1[each.key].id
    }

    target {
      type = "schedule_reference"
      id   = pagerduty_schedule.pagerduty_schedule_shadow[each.key].id
    }
  }

  rule {
    escalation_delay_in_minutes = 10

    target {
      type = "schedule_reference"
      id   = pagerduty_schedule.pagerduty_schedule_l2[each.key].id
    }

    # Infra L1 team is also notified as an L2.
    target {
      type = "schedule_reference"
      id   = pagerduty_schedule.pagerduty_schedule_l1[var.platform_team_id].id
    }
  }
}

Debug Output

It doesn't show anything relevant as API calls are always successful.

Panic Output

There is no panic

Expected Behavior

After a successful application does not show any drift.

Actual Behavior

It's reporting a change in the plan. Checking the PagerDuty API looks like it's not respecting the order in which the targets are set through the terraform code. It's not happening for all our teams but from time to time it does.

  # pagerduty_escalation_policy.pagerduty_escalation_policy["team1"] will be updated in-place
  ~ resource "pagerduty_escalation_policy" "pagerduty_escalation_policy" {
        id          = "PXG6Z7L"
        name        = "Team 1"
        # (3 unchanged attributes hidden)

      ~ rule {
            id                          = "P8FALIY"
            # (1 unchanged attribute hidden)

          ~ target {
              ~ id   = "PITCNUM" -> "PXYF8QJ"
                # (1 unchanged attribute hidden)
            }
          ~ target {
              ~ id   = "PXYF8QJ" -> "PITCNUM"
                # (1 unchanged attribute hidden)
            }
        }

        # (1 unchanged block hidden)
    }

We tried to order targets using terraform dynamic blocks but it looks like the PagerDuty API it's ordering rule targets in not a consistent way or not just based on the ID attribute.

dynamic "target" {
      # Infra L1 team is also notified as an L2.
      for_each = reverse(sort([pagerduty_schedule.pagerduty_schedule_l2[each.key].id, pagerduty_schedule.pagerduty_schedule_l1[var.platform_team_id].id]))

      content {
        type = "schedule_reference"
        id   = target.value
      }
    }

Steps to Reproduce

  1. terraform apply
  2. terraform plan

Important Factoids

No, I'm aware of it.

References

None

pacoguzman avatar Feb 21 '23 11:02 pacoguzman

I am having the same issue and when I apply terraform, it does not take those apply into account and again showing the same changes for escalation policies and is just with target id

terraform {
  required_version = "~> 1.0"
  required_providers {
    pagerduty = {
      source  = "pagerduty/pagerduty"
      version = "2.11.0"
    }
  }
}

when I run terraform plan I have 17 changes, and after applying again i have the same changes to be apply

Plan: 0 to add, 17 to change, 0 to destroy.
Apply complete! Resources: 0 added, 17 changed, 0 destroyed
Plan: 0 to add, 17 to change, 0 to destroy.
-place
  ~ resource "pagerduty_escalation_policy" "missionl" {
        id          = "PCZ85M9"
        name        = "mission"
        # (3 unchanged attributes hidden)

      ~ rule {
            id                          = "P0G5O1O"
            # (1 unchanged attribute hidden)

          ~ target {
              ~ id   = "PK8ZGMK"-> "PWSSUY6"
                # (1 unchanged attribute hidden)
            }
          ~ target {
              ~ id   = "PQTZ9UY" -> "P20RX4S"
                # (1 unchanged attribute hidden)
            }
          ~ target {
              ~ id   = "P20RX4S" -> "PQTZ9UY"
                # (1 unchanged attribute hidden)
            }
          ~ target {
              ~ id   = "PWSSUY6" -> "PK8ZGMK"
                # (1 unchanged attribute hidden)
            }

            # (1 unchanged block hidden)
        }

        # (1 unchanged block hidden)
    }

kiyanabah avatar Feb 21 '23 12:02 kiyanabah

Same for me. Is there any known fix?

Yuriy6735 avatar Feb 22 '23 09:02 Yuriy6735

same here. I tried to rollback to a previous provider version (2.8.1) but the issue is still present. My workaround is to have a different rule for each target with an escalation_delay_in_minutes=1 each

pie-r avatar Mar 01 '23 08:03 pie-r

https://github.com/PagerDuty/terraform-provider-pagerduty/blob/8f7b13367ffcca4e6db0f285ec61895e0361a6cc/pagerduty/resource_pagerduty_escalation_policy.go#L48 I guess that change of TypeList to TypeSet may resolve this issue.

bigwheel avatar Jul 14 '23 01:07 bigwheel