adop-doa-materials icon indicating copy to clipboard operation
adop-doa-materials copied to clipboard

DOA CloudFormation Stack Failures - DOAWaitCondition Timeout

Open ghost opened this issue 7 years ago • 5 comments

Running DevOps Academy session and 2 out of 14 users' stacks failed to create and rolled back due to timeouts on the DOAWaitCondition. Deploying to eu-west-1 region. Unable to see bootstrap script output as instances were terminated during rollback, however relevant events from CloudFormation event logs below:

13:05:51 UTC+0300	ROLLBACK_IN_PROGRESS	AWS::CloudFormation::Stack	xxxstack01xxx	The following resource(s) failed to create: [DOAWaitCondition]. . Rollback requested by user.
Physical ID:arn:aws:cloudformation:eu-west-1:<ACCOUNT_ID>:stack/xxxstack01xxx/b9440780-3ae3-11e7-842c-500c3d7df6d2
13:05:49 UTC+0300	CREATE_FAILED	AWS::CloudFormation::WaitCondition	DOAWaitCondition	WaitCondition timed out. Received 0 conditions when expecting 1
Physical ID:arn:aws:cloudformation:eu-west-1:<ACCOUNT_ID>:stack/xxxstack01xxx/b9440780-3ae3-11e7-842c-500c3d7df6d2/DOAWaitConditionHandle
12:34:04 UTC+0300	CREATE_IN_PROGRESS	AWS::CloudFormation::WaitCondition	DOAWaitCondition	Resource creation Initiated
Physical ID:arn:aws:cloudformation:eu-west-1:<ACCOUNT_ID>:stack/xxxstack01xxx/b9440780-3ae3-11e7-842c-500c3d7df6d2/DOAWaitConditionHandle
12:34:03 UTC+0300	CREATE_IN_PROGRESS	AWS::CloudFormation::WaitCondition	DOAWaitCondition
12:33:59 UTC+0300	CREATE_COMPLETE	AWS::EC2::Instance	DOAEc2Instance
Physical ID:i-0b72d4c733ffc1d45
12:33:27 UTC+0300	CREATE_COMPLETE	AWS::EC2::SubnetRouteTableAssociation	PublicSubnetPublicRouteTableAssoc
Physical ID:rtbassoc-1c49d47a
12:33:12 UTC+0300	CREATE_IN_PROGRESS	AWS::EC2::Instance	DOAEc2Instance	Resource creation Initiated
Physical ID:i-0b72d4c733ffc1d45

Also:

13:00:15 UTC+0300	ROLLBACK_IN_PROGRESS	AWS::CloudFormation::Stack	xxxstack02xxx	The following resource(s) failed to create: [DOAWaitCondition]. . Rollback requested by user.
Physical ID:arn:aws:cloudformation:eu-west-1:<ACCOUNT_ID>:stack/xxxstack02xxx/f6e8a0b0-3ae2-11e7-a2e1-50faeb59c0d2
13:00:13 UTC+0300	CREATE_FAILED	AWS::CloudFormation::WaitCondition	DOAWaitCondition	WaitCondition timed out. Received 0 conditions when expecting 1
Physical ID:arn:aws:cloudformation:eu-west-1:<ACCOUNT_ID>:stack/xxxstack02xxx/f6e8a0b0-3ae2-11e7-a2e1-50faeb59c0d2/DOAWaitConditionHandle
12:28:34 UTC+0300	CREATE_IN_PROGRESS	AWS::CloudFormation::WaitCondition	DOAWaitCondition	Resource creation Initiated
Physical ID:arn:aws:cloudformation:eu-west-1:<ACCOUNT_ID>:stack/xxxstack02xxx/f6e8a0b0-3ae2-11e7-a2e1-50faeb59c0d2/DOAWaitConditionHandle
12:28:34 UTC+0300	CREATE_IN_PROGRESS	AWS::CloudFormation::WaitCondition	DOAWaitCondition
12:28:30 UTC+0300	CREATE_COMPLETE	AWS::EC2::Instance	DOAEc2Instance
Physical ID:i-03f7b08b74c4e70bb
12:27:58 UTC+0300	CREATE_COMPLETE	AWS::EC2::SubnetRouteTableAssociation	PublicSubnetPublicRouteTableAssoc
12:27:44 UTC+0300	CREATE_COMPLETE	AWS::EC2::Route	PublicRouteDefault
12:27:43 UTC+0300	CREATE_IN_PROGRESS	AWS::EC2::Instance	DOAEc2Instance	Resource creation Initiated
Physical ID:i-03f7b08b74c4e70bb

ghost avatar May 17 '17 12:05 ghost

Facing the same issue. Is this dependent on having specific EC2 ami?

alt-ctrl-dev avatar Jun 22 '17 02:06 alt-ctrl-dev

@davidferguson-acn, what kind of VM have you tried to use?

If you are using an Amazon AMI, it will not work.

alt-ctrl-dev avatar Jun 25 '17 23:06 alt-ctrl-dev

This is still ongoing problem, having exactly the same problem now at the session. @reubenkcoutinho , isn't doa_stack.json given? Changing it (..changing AMIs) is not part of exercises, we would not have that time on our hands anyway.

lukaasp avatar Nov 22 '17 12:11 lukaasp

Same issue - anyone get a fix for this?

stevechappell2000 avatar Feb 01 '18 15:02 stevechappell2000

ADOPS-57957 was logged for this issue but closed as the team were not able to reproduce (they didn't actually try creating multiple stacks).

Just run another session and again had multiple failures of the same nature. I have cloned my original issue to new issue ADOPS-67662, so let's see if the team can progress.

The AMI is not relevant.

It appears there are a few different causes:

  1. Instance launching before public route and Internet Gateway successfully created (may be fixed in #31?)
  2. Timeout waiting for Load_Platform Jenkins job
  3. Timeout waiting for ADOP project to load (Checking if job has been built previously or if build has initialised...)
  4. Docker image retreival failure (filesystem layer verification failed for digest sha256:f936d423a614f3882ae0bf7f511e3d2aa4227d5fb1e614e1d7a930c0a3aa6b03)

ghost avatar May 10 '18 10:05 ghost