terraform-provider-pagerduty
terraform-provider-pagerduty copied to clipboard
Schedule deletion does not work when incidents are still open
Terraform Version
2.6.1
Affected Resource(s)
"pagerduty_schedule"
Terraform Configuration Files
This would be after deleting any schedule linked to an escalation policy
resource "pagerduty_team" "foo" {
name = "%s"
description = "fighters"
}
resource "pagerduty_schedule" "foo" {
name = "%s"
time_zone = "%s"
description = "foo"
teams = [pagerduty_team.foo.id]
layer {
name = "foo"
start = "%s"
rotation_virtual_start = "%s"
rotation_turn_length_seconds = 86400
users = [pagerduty_user.foo.id]
restriction {
type = "daily_restriction"
start_time_of_day = "08:00:00"
duration_seconds = 32101
}
}
}
resource "pagerduty_escalation_policy" "foo" {
name = "%s"
num_loops = 2
teams = [pagerduty_team.foo.id]
rule {
escalation_delay_in_minutes = 10
target {
type = "user_reference"
id = pagerduty_user.foo.id
}
target {
type = "schedule_reference"
id = pagerduty_schedule.foo.id
}
}
}
Debug Output
Panic Output
"Schedule can't be deleted if it's being used by an escalation policy snapshot with open incidents"
Expected Behavior
Should close incidents and remove the schedule
Actual Behavior
Schedule is not removed
Steps to Reproduce
- Trigger an incident on a service that has an Escalation policy with a schedule in it. (the schedule can be in any layer of the escalation policy)
- Remove the schedule from the escalation policy
- Attempt to delete the schedule
- This is when you should get the error message "Schedule can't be deleted if it's being used by an escalation policy snapshot with open incidents"
Important Factoids
Are there anything atypical about your accounts that we should know? For example: Running in EC2 Classic? Custom version of OpenStack? Tight ACLs?
References
Are there any other GitHub issues (open or closed) or Pull Requests that should be linked here? For example:
- GH-1234
This is a blocker for configuring PagerDuty in code. It's a tricky one but I'd suggest that the incidents would be resolved automatically (i.e. the effect would cascade).
When a trying to delete a Schedule that is being used by an Escalation Policy with open incidents, but additionally that Schedule gets removed from Escalation Policy to be part of another EP or just to be deleted, the Schedule’s data loses the traceability with the EP with the open incidents, because that relation is tracked through the EP snapshot created when the incident gets triggered.
So, The error received from that deletion intend is the following:
[Schedule can't be deleted if it's being used by an escalation policy snapshot with open incidents]
Therefore, at the /schedules
public API level, We would need the id(s) of the open incidents or at least the id(s) of the EP with the open incidents, to inform the TF Users through the error message which incidents need to be resolved or reassigned. Like We currently do with Schedules in this scenario with traceable incidents.
So, as long as an update to errors messages on this case for /schedules
is not released, We won't be able to present a more helpful error.
This has been informed to PagerDuty /schedules
API owner and they already have it in their roadmap, unfortunately We don't have an ETA yet.
No worries, FWIW we ended up writing a custom pagerduty controller (like a k8s operator) that reconciles desired config with what is in pagerduty. it was a little tricky since the api has some gotchas like this - hit another one today: support hours need to be HH:MM:00, so 23:59:59 won't work 😄 no big deal though as we worked around them.