terraform-provider-aws icon indicating copy to clipboard operation
terraform-provider-aws copied to clipboard

Handling of UPDATE_ROLLBACK_COMPLETE CloudFormation status

Open diversario opened this issue 5 years ago • 7 comments

I have a CloudFormation stack that's created by Terraform. It's basically

resource "aws_cloudformation_stack" "aws_transit_vpc" {
  parameters {
 	// stuff
  }

  template_body = <a copy of https://github.com/awslabs/aws-transit-vpc/blob/master/deployment/transit-vpc-spoke-vpc.template>
}

It's not ideal but it works. Yesterday I edited the template, attempted an apply and got back

Releasing state lock. This may take a few moments...

Error: Error applying plan:

1 error(s) occurred:

* module.transit-vpc-sandbox-us-east-1.aws_cloudformation_stack.aws_transit_vpc: 1 error(s) occurred:
* aws_cloudformation_stack.aws_transit_vpc: UPDATE_ROLLBACK_COMPLETE: ["Both CidrIp and SourceSecurityGroup cannot be specified"]

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

exit status 1

My template update was invalid so apply failed. CloudFormation was able to rollback: image

so now the state of the stack was UPDATE_ROLLBACK_COMPLETE, which in my understanding is a non-error state.

I removed my template customization and applied again, thinking it would put me back to where I was before, only to get this error:

Error: Error applying plan:
1 error(s) occurred:
* module.transit-vpc-sandbox-us-east-1.aws_cloudformation_stack.aws_transit_vpc: 1 error(s) occurred:
* aws_cloudformation_stack.aws_transit_vpc: UPDATE_ROLLBACK_COMPLETE: []

From this point on, I was unable to apply any changes to the stack through Terraform.

I fumbled quite a bit with this but eventually it occurred to me that the state was causing Terraform to fail and that I should try to put the stack into a different state. So I went into the AWS console and applied a change set that added a tag to one of the resources: image

Now the stack was in UPDATE_COMPLETE state. After this, Terraform apply was successful and my issue was resolved.

However, this made me question the handling of the status – looking at the code, the CF provider always treats the UPDATE_ROLLBACK_COMPLETE as an error state. This means that there's no path forward for Terraform without having user interact with the stack directly (as I understand this).

Is this the intended behavior or can the provider be improved to handle this better in some way?

diversario avatar Jul 14 '18 16:07 diversario

Hey @diversario are you still doing this today? I'm hitting the same issue and wondering if there's a better workaround.

mjalkio avatar Nov 29 '18 22:11 mjalkio

I've not done this since I filed the issue but I imagine that workaround would still work. It's the only one that worked for me and that weren't totally terrifying to perform 😅

diversario avatar Nov 30 '18 00:11 diversario

I am running into the same issue and resolved it by creating the change sets with 0 modifications so that they reflect whatever terraform had before the failure which will be identical to the rollbacked state.

We really need a fix on this as it can become cumbersome and terraform does not respect the CF state correctly.

acesir avatar Jun 30 '19 00:06 acesir

@diversario thanks for opening this bug. This is SUPER ANNOYING!

When I run into this I update the description in a terraform output to include an extra space or something then apply the stack via the AWS console.

I would expect that terraform would at least try to update the stack instead of immediately failing.

unacceptable avatar Aug 15 '19 18:08 unacceptable

I ran into this issue today

wfeng-fsde avatar Aug 20 '20 02:08 wfeng-fsde

I just ran into this in a scenario where some vendor-provided CloudFormation stacks had been changed and the old and new stack updates were racing each other creating a single-named IAM role which moved across stacks. It would have been nice if Terraform had some way to retry the stack update since it would have worked if it ran a couple of seconds later.

acdha avatar Sep 30 '20 18:09 acdha

Marking this issue as stale due to inactivity. This helps our maintainers find and focus on the active issues. If this issue receives no comments in the next 30 days it will automatically be closed. Maintainers can also remove the stale label.

If this issue was automatically closed and you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thank you!

github-actions[bot] avatar Sep 21 '22 17:09 github-actions[bot]