aws-nuke icon indicating copy to clipboard operation
aws-nuke copied to clipboard

Global RDS Clusters cannot be removed

Open davidjameshowell opened this issue 5 years ago • 5 comments

Hi,

I am using aws-nuke with DCE (Disposable Cloud Environment) . In testing today some Terraform code that creates a Global Aurora RDS database (multi-region), I discovered that after my lease expired and DCE went to work initiating aws-nuke, it encountered a snowball of issues.

The main issue stems in the removal process of a Global RDS cluster.

us-east-1 - RDSInstance - aurora-example-global-virginia-2 - [AvailabilityZone: "us-east-1d", DeletionProtection: "false", Engine: "aurora-postgresql", EngineVersion: "11.7", Identifier: "aurora-example-global-virginia-2", InstanceClass: "db.r5.large", MultiAZ: "false", PubliclyAccessible: "false"] - failed
--
time="2020-11-04T07:30:56Z" level=error msg="InvalidDBClusterStateFault: Cannot delete the last instance of the master cluster. Delete the replica cluster before deleting the last master cluster instance.\n\tstatus code: 400, request id: xxx-bcc8-35d3812ff5eb"

You can replicate it using the Terraform code listed here (forked from the Terraform RDS Aurora module for cluster creation - reference hash e5da5456d33d1cfa51a53615310b62021efaf188). Create the cluster from the examples/global reference, then run AWS Nuke.

I am not sure how feasible it may be to accomplish given that it is much a "delete this, this this, then that, then that" with regards to Global Clusters.

Please let me know if there is more additional information needed for this!

davidjameshowell avatar Nov 04 '20 08:11 davidjameshowell

Hi @davidjameshowell,

thank you for the bug report. Though it looks like "DCE" is using a aws-nuke fork, so we dont know which version of upstream aws-nuke the code is running.

Therefore please try to replicate this with vanilla aws-nuke or report it to DCE directly. Though it probably ends up with changes to aws-nuke to also support replica deletion.

For this to work we would need to get a vanilla aws-nuke config.

Thank you

bjoernhaeuser avatar Nov 04 '20 08:11 bjoernhaeuser

I just noticed they're using a modified fork, my bad. None the less, I believe it's a order of operations removal for RDS Global clusters (even manual is a tedious process from regular RDS). I will get a fresh account and better replication steps with the current aws-nuke in order to better validate with the actual upstream software. Thanks!

davidjameshowell avatar Nov 04 '20 09:11 davidjameshowell

Okay, I have replication steps now -

Steps to reproduce:

  1. On a fresh AWS account, run global Terraform example using default values (including default VPC) with Terraform 0.13.5.
  2. Wait for Global RDS cluster and instances to be ready and available.
  3. Run AWS Nuke with config below via Docker with --no-dry-run:
docker run \
    --rm -it \
    -v /home/user/awsnuke/nuke-config.yml:/home/aws-nuke/config.yml \
    quay.io/rebuy/aws-nuke:v2.11.0 \
    --access-key-id AWSACCESSKEY\
    --secret-access-key AWSSECRETACCESSKEY\
    --config /home/aws-nuke/config.yml
regions:
- us-east-1
- us-east-2

account-blacklist:
- "999999999999" # production

accounts:
  "948XXXXXX226": {}
  1. AWS Nuke will delete most resources except one instance, two clusters, and the global cluster.

The relevant error is:

Removal requested: 0 waiting, 11 failed, 61 skipped, 29 finished

ERRO[0438] There are resources in failed state, but none are ready for deletion, anymore.

us-east-1 - EC2RouteTable - rtb-0cbcb46XXXXX64080 - [] - failed
ERRO[0438] DependencyViolation: The routeTable 'rtb-0cbcXXXX63b664080' has dependencies and cannot be deleted.
        status code: 400, request id: e0e54a74-d4ae-4087-a54f-5dcb35011744
us-east-1 - EC2VPC - vpc-032cXXXXa1cb60092 - [ID: "vpc-032cXXXXa1cb60092", IsDefault: "true"] - failed
ERRO[0438] DependencyViolation: The vpc 'vpc-032cb2XXXXXb60092' has dependencies and cannot be deleted.
        status code: 400, request id: 5af1a695-51e5-405d-86e7-52604ada0b62
us-east-1 - RDSDBCluster - aurora-example-global-virginia - failed
ERRO[0438] InvalidDBClusterStateFault: This cluster is a part of a global cluster, please remove it from globalcluster first
        status code: 400, request id: a96b8bd9-88b6-4ff6-be09-503532719765
us-east-1 - EC2SecurityGroup - sg-0cdcXXXXXaa3e9893 - [Name: "aurora-example-global-virginia-20201105063555408700000001"] - failed
ERRO[0438] DependencyViolation: resource sg-0cdca5XXXXX3e9893 has a dependent object
        status code: 400, request id: b380708c-1449-4cf0-b1b4-38725f643af0
us-east-1 - EC2DHCPOption - dopt-060d56d8e6f39e804 - [] - failed
ERRO[0438] DependencyViolation: The dhcpOptions 'dopt-060d5XXXX6f39e804' has dependencies and cannot be deleted.
        status code: 400, request id: 702ee68b-f191-44cc-bb4f-214b9b476611
us-east-1 - RDSDBSubnetGroup - aurora-example-global-virginia - failed
ERRO[0438] InvalidDBSubnetGroupStateFault: Cannot delete the subnet group 'aurora-example-global-virginia' because at least one database inst
ance: aurora-example-global-virginia-2 is still using it.
        status code: 400, request id: 500d338a-4dbb-48ed-8470-dfdf35206268
us-east-1 - EC2Subnet - subnet-02911XXXXfc4a86e3 - [DefaultForAz: "true"] - failed
ERRO[0438] DependencyViolation: The subnet 'subnet-02911XXXXfc4a86e3' has dependencies and cannot be deleted.
        status code: 400, request id: af8cb8e8-dbdb-4215-ba57-df0b2e7c2735
us-east-1 - RDSInstance - aurora-example-global-virginia-2 - failed
ERRO[0438] InvalidDBClusterStateFault: Cannot delete the last instance of the master cluster. Delete the replica cluster before deleting the 
last master cluster instance.
        status code: 400, request id: 52802137-5ede-47a2-b239-8dd2b116326b
us-east-1 - NeptuneInstance - aurora-example-global-virginia-2 - failed
ERRO[0438] InvalidDBClusterStateFault: Cannot delete the last instance of the master cluster. Delete the replica cluster before deleting the 
last master cluster instance.
        status code: 400, request id: 9a0e29e1-052b-4648-8528-22cd73741b76
us-east-1 - EC2NetworkInterface - [AvailabilityZone: "us-east-1a", PrivateIPAddress: "172.0.0.253", SubnetID: "subnet-02911XXXXXc4a86e3", S
tatus: "in-use", ID: "eni-0730XXXXXX679e83c", VPC: "vpc-032cb2XXXXXb60092"] - failed
ERRO[0438] InvalidParameterValue: Network interface 'eni-0730c98173679e83c' is currently in use.
        status code: 400, request id: 89b53aac-6772-427d-846d-719778726a6f
us-east-1 - NeptuneCluster - aurora-example-global-virginia - failed
ERRO[0438] InvalidDBClusterStateFault: This cluster is a part of a global cluster, please remove it from globalcluster first
        status code: 400, request id: 35f5fa8f-0604-4886-845e-32fbc4145ab5
Error: failed

Let me know if this is sufficient enough to create a formal issue.

davidjameshowell avatar Nov 05 '20 07:11 davidjameshowell

Is there any update on the above? Is the information provided enough to open a formal report?

davidjameshowell avatar Apr 13 '21 20:04 davidjameshowell

It looks like a RDS resource called "Global Cluster" is not yet implemented by aws-nuke.

svenwltr avatar Apr 14 '21 08:04 svenwltr