aws-nuke
aws-nuke copied to clipboard
Global RDS Clusters cannot be removed
Hi,
I am using aws-nuke with DCE (Disposable Cloud Environment) . In testing today some Terraform code that creates a Global Aurora RDS database (multi-region), I discovered that after my lease expired and DCE went to work initiating aws-nuke, it encountered a snowball of issues.
The main issue stems in the removal process of a Global RDS cluster.
us-east-1 - RDSInstance - aurora-example-global-virginia-2 - [AvailabilityZone: "us-east-1d", DeletionProtection: "false", Engine: "aurora-postgresql", EngineVersion: "11.7", Identifier: "aurora-example-global-virginia-2", InstanceClass: "db.r5.large", MultiAZ: "false", PubliclyAccessible: "false"] - failed
--
time="2020-11-04T07:30:56Z" level=error msg="InvalidDBClusterStateFault: Cannot delete the last instance of the master cluster. Delete the replica cluster before deleting the last master cluster instance.\n\tstatus code: 400, request id: xxx-bcc8-35d3812ff5eb"
You can replicate it using the Terraform code listed here (forked from the Terraform RDS Aurora module for cluster creation - reference hash e5da5456d33d1cfa51a53615310b62021efaf188). Create the cluster from the examples/global reference, then run AWS Nuke.
I am not sure how feasible it may be to accomplish given that it is much a "delete this, this this, then that, then that" with regards to Global Clusters.
Please let me know if there is more additional information needed for this!
Hi @davidjameshowell,
thank you for the bug report. Though it looks like "DCE" is using a aws-nuke fork, so we dont know which version of upstream aws-nuke the code is running.
Therefore please try to replicate this with vanilla aws-nuke or report it to DCE directly. Though it probably ends up with changes to aws-nuke to also support replica deletion.
For this to work we would need to get a vanilla aws-nuke config.
Thank you
I just noticed they're using a modified fork, my bad. None the less, I believe it's a order of operations removal for RDS Global clusters (even manual is a tedious process from regular RDS). I will get a fresh account and better replication steps with the current aws-nuke in order to better validate with the actual upstream software. Thanks!
Okay, I have replication steps now -
Steps to reproduce:
- On a fresh AWS account, run global Terraform example using default values (including default VPC) with Terraform 0.13.5.
- Wait for Global RDS cluster and instances to be ready and available.
- Run AWS Nuke with config below via Docker with --no-dry-run:
docker run \
--rm -it \
-v /home/user/awsnuke/nuke-config.yml:/home/aws-nuke/config.yml \
quay.io/rebuy/aws-nuke:v2.11.0 \
--access-key-id AWSACCESSKEY\
--secret-access-key AWSSECRETACCESSKEY\
--config /home/aws-nuke/config.yml
regions:
- us-east-1
- us-east-2
account-blacklist:
- "999999999999" # production
accounts:
"948XXXXXX226": {}
- AWS Nuke will delete most resources except one instance, two clusters, and the global cluster.
The relevant error is:
Removal requested: 0 waiting, 11 failed, 61 skipped, 29 finished
ERRO[0438] There are resources in failed state, but none are ready for deletion, anymore.
us-east-1 - EC2RouteTable - rtb-0cbcb46XXXXX64080 - [] - failed
ERRO[0438] DependencyViolation: The routeTable 'rtb-0cbcXXXX63b664080' has dependencies and cannot be deleted.
status code: 400, request id: e0e54a74-d4ae-4087-a54f-5dcb35011744
us-east-1 - EC2VPC - vpc-032cXXXXa1cb60092 - [ID: "vpc-032cXXXXa1cb60092", IsDefault: "true"] - failed
ERRO[0438] DependencyViolation: The vpc 'vpc-032cb2XXXXXb60092' has dependencies and cannot be deleted.
status code: 400, request id: 5af1a695-51e5-405d-86e7-52604ada0b62
us-east-1 - RDSDBCluster - aurora-example-global-virginia - failed
ERRO[0438] InvalidDBClusterStateFault: This cluster is a part of a global cluster, please remove it from globalcluster first
status code: 400, request id: a96b8bd9-88b6-4ff6-be09-503532719765
us-east-1 - EC2SecurityGroup - sg-0cdcXXXXXaa3e9893 - [Name: "aurora-example-global-virginia-20201105063555408700000001"] - failed
ERRO[0438] DependencyViolation: resource sg-0cdca5XXXXX3e9893 has a dependent object
status code: 400, request id: b380708c-1449-4cf0-b1b4-38725f643af0
us-east-1 - EC2DHCPOption - dopt-060d56d8e6f39e804 - [] - failed
ERRO[0438] DependencyViolation: The dhcpOptions 'dopt-060d5XXXX6f39e804' has dependencies and cannot be deleted.
status code: 400, request id: 702ee68b-f191-44cc-bb4f-214b9b476611
us-east-1 - RDSDBSubnetGroup - aurora-example-global-virginia - failed
ERRO[0438] InvalidDBSubnetGroupStateFault: Cannot delete the subnet group 'aurora-example-global-virginia' because at least one database inst
ance: aurora-example-global-virginia-2 is still using it.
status code: 400, request id: 500d338a-4dbb-48ed-8470-dfdf35206268
us-east-1 - EC2Subnet - subnet-02911XXXXfc4a86e3 - [DefaultForAz: "true"] - failed
ERRO[0438] DependencyViolation: The subnet 'subnet-02911XXXXfc4a86e3' has dependencies and cannot be deleted.
status code: 400, request id: af8cb8e8-dbdb-4215-ba57-df0b2e7c2735
us-east-1 - RDSInstance - aurora-example-global-virginia-2 - failed
ERRO[0438] InvalidDBClusterStateFault: Cannot delete the last instance of the master cluster. Delete the replica cluster before deleting the
last master cluster instance.
status code: 400, request id: 52802137-5ede-47a2-b239-8dd2b116326b
us-east-1 - NeptuneInstance - aurora-example-global-virginia-2 - failed
ERRO[0438] InvalidDBClusterStateFault: Cannot delete the last instance of the master cluster. Delete the replica cluster before deleting the
last master cluster instance.
status code: 400, request id: 9a0e29e1-052b-4648-8528-22cd73741b76
us-east-1 - EC2NetworkInterface - [AvailabilityZone: "us-east-1a", PrivateIPAddress: "172.0.0.253", SubnetID: "subnet-02911XXXXXc4a86e3", S
tatus: "in-use", ID: "eni-0730XXXXXX679e83c", VPC: "vpc-032cb2XXXXXb60092"] - failed
ERRO[0438] InvalidParameterValue: Network interface 'eni-0730c98173679e83c' is currently in use.
status code: 400, request id: 89b53aac-6772-427d-846d-719778726a6f
us-east-1 - NeptuneCluster - aurora-example-global-virginia - failed
ERRO[0438] InvalidDBClusterStateFault: This cluster is a part of a global cluster, please remove it from globalcluster first
status code: 400, request id: 35f5fa8f-0604-4886-845e-32fbc4145ab5
Error: failed
Let me know if this is sufficient enough to create a formal issue.
Is there any update on the above? Is the information provided enough to open a formal report?
It looks like a RDS resource called "Global Cluster" is not yet implemented by aws-nuke.