terraform-provider-mongodbatlas `error deleting MongoDB Network Peering Container`, even though Container exists

`error deleting MongoDB Network Peering Container`, even though Container exists

Open MikiLoz92 opened this issue 2 years ago • 20 comments

Terraform CLI and Terraform MongoDB Atlas Provider Version

Terraform v1.0.8
on darwin_amd64

Your version of Terraform is out of date! The latest version
is 1.0.11. You can update by downloading from https://www.terraform.io/downloads.html

Provider version 1.0.2

Steps to Reproduce

Create network container in an existing project, with no clusters in it
Create cluster
Delete cluster
Delete network container -> fails

Debug Output

Ok, so I've been coping with this issue for a while and I thought I had fixed it by setting a simple timer between resources, but it resurfaced again even then. It seems strange because, when trying to destroy the network container, the provider is complaining that the network container doesn't exist, even though it does:

module.mongodb_atlas_aws[0].mongodbatlas_network_container.default: Still destroying... [id=Y29udGFpbmVyX2lk:NjE3MDAzODAwMDNkMzcwMW...pZA==:NjE3MDAzNDY4ODA4YmQ2ODA5ZDdjYWE3, 4m10s elapsed]
module.mongodb_atlas_aws[0].mongodbatlas_network_container.default: Still destroying... [id=Y29udGFpbmVyX2lk:NjE3MDAzODAwMDNkMzcwMW...pZA==:NjE3MDAzNDY4ODA4YmQ2ODA5ZDdjYWE3, 4m20s elapsed]
module.mongodb_atlas_aws[0].mongodbatlas_network_container.default: Still destroying... [id=Y29udGFpbmVyX2lk:NjE3MDAzODAwMDNkMzcwMW...pZA==:NjE3MDAzNDY4ODA4YmQ2ODA5ZDdjYWE3, 4m30s elapsed]
module.mongodb_atlas_aws[0].mongodbatlas_network_container.default: Still destroying... [id=Y29udGFpbmVyX2lk:NjE3MDAzODAwMDNkMzcwMW...pZA==:NjE3MDAzNDY4ODA4YmQ2ODA5ZDdjYWE3, 4m40s elapsed]
module.mongodb_atlas_aws[0].mongodbatlas_network_container.default: Still destroying... [id=Y29udGFpbmVyX2lk:NjE3MDAzODAwMDNkMzcwMW...pZA==:NjE3MDAzNDY4ODA4YmQ2ODA5ZDdjYWE3, 4m50s elapsed]
module.mongodb_atlas_aws[0].mongodbatlas_network_container.default: Still destroying... [id=Y29udGFpbmVyX2lk:NjE3MDAzODAwMDNkMzcwMW...pZA==:NjE3MDAzNDY4ODA4YmQ2ODA5ZDdjYWE3, 5m0s elapsed]
module.mongodb_atlas_aws[0].mongodbatlas_network_container.default: Still destroying... [id=Y29udGFpbmVyX2lk:NjE3MDAzODAwMDNkMzcwMW...pZA==:NjE3MDAzNDY4ODA4YmQ2ODA5ZDdjYWE3, 5m10s elapsed]
module.mongodb_atlas_aws[0].mongodbatlas_network_container.default: Still destroying... [id=Y29udGFpbmVyX2lk:NjE3MDAzODAwMDNkMzcwMW...pZA==:NjE3MDAzNDY4ODA4YmQ2ODA5ZDdjYWE3, 5m20s elapsed]
╷
│ Error: error deleting MongoDB Network Peering Container (61700380003d3701c115dc58): couldn't find resource (21 retries)
│ 
│ 
╵

However, I went to the MongoDB Atlas Administration API, and the network container did in fact exist:

{
  "links": [
    {
      "href": "https://cloud.mongodb.com/api/atlas/v1.0/groups/REDACTED/containers/all?pageNum=1&itemsPerPage=100",
      "rel": "self"
    }
  ],
  "results": [
    {
      "atlasCidrBlock": "10.1.128.0/24",
      "id": "61700380003d3701c115dc58",
      "providerName": "AWS",
      "provisioned": true,
      "regionName": "EU_WEST_3",
      "vpcId": "REDACTED"
    }
  ],
  "totalCount": 1
}

You can check it's the same container ID. I tried deleting it once again with TF, to no sucess (same error).

Then I tried deleting it from the API (to which I got no problems, 200 OK), and upon checking the containers list again:

{
  "links": [
    {
      "href": "https://cloud.mongodb.com/api/atlas/v1.0/groups/REDACTED/containers/all?pageNum=1&itemsPerPage=100",
      "rel": "self"
    }
  ],
  "results": [],
  "totalCount": 0
}

it was successfully deleted.

Worth noting that I'm using the same MongoDB Atlas credentials (and same organization & project ID) throughout my code. So it is not possible that the provider might be picking different project or group ID...

Additional Context

The network container that was trying to delete:

# module.mongodb_atlas_aws[0].mongodbatlas_network_container.default will be destroyed
  - resource "mongodbatlas_network_container" "default" {
      - atlas_cidr_block = "10.1.128.0/24" -> null
      - container_id     = "61700380003d3701c115dc58" -> null
      - id               = "REDACTED" -> null
      - project_id       = "REDACTED" -> null
      - provider_name    = "AWS" -> null
      - provisioned      = true -> null
      - region_name      = "EU_WEST_3" -> null
      - regions          = [] -> null
      - vpc_id           = "REDACTED" -> null
    }

My setup is quite convoluted so unfortunately I've not managed yet to provide a minimized reproducer. However, it might be worth investigating why this happens...? Given that (I reckon) it should be a simple DELETE HTTP method.

Nov 11 '21 16:11 MikiLoz92

@MikiLoz92 Can you share the project ID so that I can check the reason for the deletion not happening and any error generated for that in our backend logs. Also the debug logs in these cases help us the most so if you can repro this issue and generate the debug logs using below commands -

export TF_LOG=trace
export TF_LOG_PATH=debug.log

Nov 13 '21 03:11 nikhil-mongo

Hi @nikhil-mongo unfortunately I already deleted that project and did not keep the ID... it was a temporary environment. I'll try to reproduce this bug again shortly and send you the logs, but it's one of those that seem to trigger randomly...

Nov 16 '21 09:11 MikiLoz92

I cannot repro this at my end and everything works perfectly during creation and termination. @MikiLoz92 Once you have the logs, it will help us a lot.

Nov 16 '21 12:11 nikhil-mongo

Hello @nikhil-mongo , I managed to reproduce the error again, here are the logs, though unfortunately it seems no error logs were written here:

2021-12-03T11:05:53.325Z [DEBUG] Adding temp file log sink: /var/folders/jh/j5vww6pj12318zbwfdhdkpxw0000gn/T/terraform-log884639815
2021-12-03T11:05:53.326Z [INFO]  Terraform version: 1.0.8
2021-12-03T11:05:53.326Z [INFO]  Go runtime version: go1.16.4
2021-12-03T11:05:53.326Z [INFO]  CLI args: []string{"/usr/local/Cellar/tfenv/2.2.2/versions/1.0.8/terraform", "--version"}
2021-12-03T11:05:53.326Z [TRACE] Stdout is not a terminal
2021-12-03T11:05:53.326Z [TRACE] Stderr is not a terminal
2021-12-03T11:05:53.326Z [TRACE] Stdin is a terminal
2021-12-03T11:05:53.326Z [DEBUG] Attempting to open CLI config file: /Users/miki/.terraformrc
2021-12-03T11:05:53.326Z [DEBUG] File doesn't exist, but doesn't need to. Ignoring.
2021-12-03T11:05:53.327Z [DEBUG] ignoring non-existing provider search directory terraform.d/plugins
2021-12-03T11:05:53.327Z [DEBUG] ignoring non-existing provider search directory /Users/miki/.terraform.d/plugins
2021-12-03T11:05:53.327Z [DEBUG] ignoring non-existing provider search directory /Users/miki/Library/Application Support/io.terraform/plugins
2021-12-03T11:05:53.327Z [DEBUG] ignoring non-existing provider search directory /Library/Application Support/io.terraform/plugins
2021-12-03T11:05:53.328Z [INFO]  CLI command args: []string{"version", "--version"}

The container ID that was trying to delete is 61a9f0e1d490cb6e969424cb, and the project ID is 61a9f0c65e797b5ee444ffc3

Same error output:

module.mongodb_atlas_aws[0].mongodbatlas_network_container.default: Still destroying... [id=Y29udGFpbmVyX2lk:NjFhOWYwZTFkNDkwY2I2ZT...pZA==:NjFhOWYwYzY1ZTc5N2I1ZWU0NDRmZmMz, 4m40s elapsed]
module.mongodb_atlas_aws[0].mongodbatlas_network_container.default: Still destroying... [id=Y29udGFpbmVyX2lk:NjFhOWYwZTFkNDkwY2I2ZT...pZA==:NjFhOWYwYzY1ZTc5N2I1ZWU0NDRmZmMz, 4m50s elapsed]
module.mongodb_atlas_aws[0].mongodbatlas_network_container.default: Still destroying... [id=Y29udGFpbmVyX2lk:NjFhOWYwZTFkNDkwY2I2ZT...pZA==:NjFhOWYwYzY1ZTc5N2I1ZWU0NDRmZmMz, 5m0s elapsed]
module.mongodb_atlas_aws[0].mongodbatlas_network_container.default: Still destroying... [id=Y29udGFpbmVyX2lk:NjFhOWYwZTFkNDkwY2I2ZT...pZA==:NjFhOWYwYzY1ZTc5N2I1ZWU0NDRmZmMz, 5m10s elapsed]
module.mongodb_atlas_aws[0].mongodbatlas_network_container.default: Still destroying... [id=Y29udGFpbmVyX2lk:NjFhOWYwZTFkNDkwY2I2ZT...pZA==:NjFhOWYwYzY1ZTc5N2I1ZWU0NDRmZmMz, 5m20s elapsed]
╷
│ Error: error deleting MongoDB Network Peering Container (61a9f0e1d490cb6e969424cb): couldn't find resource (21 retries)
│ 
│ 
╵

Dec 04 '21 17:12 MikiLoz92

Could it have something to do with this situation that @themantissa encountered?

So as long as a cluster exists in a project that is in the same region as the container the container's provisioned value is set to true. Since one can only use the DELETE endpoint for containers when it's unprovisioned we can't remove it until all the clusters in the same region are removed. Terraform could remove it once I deleted the non-terraform created cluster.

According to her comments, and if I understood corrrectly, the container should be in provisioned = false state in order for it to be able to be deleted. However, it currently is in provisioned = true, if I check the MongoDB Atlas HTTP API.

{
  "atlasCidrBlock": "10.1.128.0/24",
  "id": "61a9f0e1d490cb6e969424cb",
  "providerName": "AWS",
  "provisioned": true,
  "regionName": "EU_WEST_3",
  "vpcId": "<REDACTED>"
}

However, the cluster that this project contained HAS been deleted in the same terraform apply, so this means that the provisioned value should go back to false? It bugs me that it hasn't...

Dec 04 '21 17:12 MikiLoz92

@MikiLoz92 Yes, there should be nothing in the project which is mapped to the VPC. It does not matters if the resource is launched using terraform or from UI or API.

There should be no cluster, any old backup snapshots in the project for the VPC which you are trying to delete.

Dec 28 '21 07:12 nikhil-mongo

Hi,

I am having the same exact problem, and I can confirm there is nothing inside the project. Just the container, same way as described by @MikiLoz92. Even though, trying to delete it through the Mongodb Atlas API yields an error:

"Cannot modify in use containers. The container still contained resources." But there is no contained resource in it..

    {
          "atlasCidrBlock": "192.168.0.0/21",
          "id": "61e6dd473d232a08b95d680e",
          "providerName": "AWS",
          "provisioned": true,
          "regionName": "EU_WEST_3",
          "vpcId": "vpc-060b451b16db96f83"
      }

Any idea why this might be happening?

Jan 27 '22 15:01 AlexRex

@MikiLoz92 @AlexRex Please share your Project ID over email [email protected] since this is a public platform and I would not want you to share your Project ID here so I will take it on 1:1 basis,

Please open a support case from within the Atlas Project and mention this git issue to continue the work.

I will check what is happening with project which is causing this behaviour.

Thanks

Jan 28 '22 08:01 nikhil-mongo

I accidentally made a comment on a related but closed ticket (https://github.com/mongodb/terraform-provider-mongodbatlas/issues/88#issuecomment-1044133495) describing the same experience as @AlexRex.

Feb 18 '22 08:02 leynebe

@leynebe, as well as others - as @nikhil-mongo notes above we can look into this issue but we'd need to see it actually happening in a project - we have not been able to repro. If you can follow his guidance above to open a support case to get us the project information we can investigate this further.

Feb 23 '22 01:02 themantissa

HI @themantissa, @nikhil-mongo, I have shared our IDs with @nikhil-mongo privately through e-mail. The issue keeps reproducing for us. Moreover, now it's impossible to delete an existing network container, even if it doesn't have any associated resources left.

Mar 07 '22 14:03 MikiLoz92

@MikiLoz92 @leynebe @AlexRex I can repro this. This happens in a condition where the cluster has been destroyed without destroying the network container. I guess the problem here is the pit_enabled=true but that is totally an assumption right now because I do not have proofs to support this. I will get this checked internally to find out more and update here.

Can you all try creating the cluster without pit_enabled=true and see if the problem still occurs.

Thanks.

Mar 08 '22 06:03 nikhil-mongo

@nikhil-mongo

I have used the following config which result in the issues discussed above:

resource "mongodbatlas_network_container" "network_container" {
  project_id       = var.mongodbatlas_project_id
  atlas_cidr_block = "10.101.0.0/24"
  provider_name    = "AWS"
  region_name      = "EU_WEST_1"
}

resource "mongodbatlas_cluster" "cluster" {
  project_id   = var.mongodbatlas_project_id
  name         = var.mongodbatlas_cluster_name
  cluster_type = "REPLICASET"

  replication_specs {
    num_shards = 1
    regions_config {
      region_name     = "EU_WEST_1"
      electable_nodes = 3
      priority        = 7
      read_only_nodes = 0
    }
  }

  cloud_backup = true
  # You may see the following parameters in the docs, don't use them since they are deprecated. See https://registry.terraform.io/providers/mongodb/mongodbatlas/latest/docs/resources/cluster#backup_enabled
  # backup_enabled          = false
  # provider_backup_enabled = false

  # Disable the Continous Cloud Backup Option
  pit_enabled = false

  auto_scaling_disk_gb_enabled = true
  mongo_db_major_version       = "4.2"
  provider_name                = "AWS"
  provider_instance_size_name  = "M10"


  lifecycle {
    prevent_destroy = true
  }

  depends_on = [mongodbatlas_network_container.network_container]
}

The backup related parameters are notably set to:

cloud_backup = true
pit_enabled = false

Mar 08 '22 07:03 leynebe

@leynebe

I can see that the below block is in use -

lifecycle {
    prevent_destroy = true
  }

Therefore, do you explicitly set it to false before termination. I am hitting this bug only with pit_enabled=true cluster and will check with this config too and will get back to you.

The steps are as below -

Create network container and cluster with pit_enabled=true
Then run terraform refresh, this sets the provisioned=true in the tfstate file
Terminate the cluster only with -target= option
Now, try to delete the container using terraform destroy and it fails.

If the pit_enabled=true is not set in the cluster, then all the steps are executed successfully. I could be wrong here and I am still in process to have proofs to validate this.

Thanks

Mar 08 '22 07:03 nikhil-mongo

@nikhil-mongo

We actually have a separate script which deletes the database with extra failsafes and confirmations since we didn't trust Terraform with this responsibilty. We thought it was too easy to mess up and we would end up with all our data gone. We initially tried finding features like AWS ALB's deletion protection or AWS RDS' deletion protection but such a feature (where deletion of a database has to be turned on manually first, before automation is able to actually delete it) to my knowledge doesn't exist in MongoDBAtlas, so we settled on a separate script.

The script actually does not touch the Terraform state, so I'm assuming the MongoDBAtlas provider must recognize on its own that the database is deleted and removes the resource in the state list but then maybe it forgets setting the provisioned state to false on the networkcontainer? Or maybe I need an extra API call in my database removal script which sets the provisioned state to false or deletes it entirely?

Mar 08 '22 09:03 leynebe

After many repros and different configurations, I am certain that the problem is using -target= to explicitly delete the cluster and it is unable to set the correct values to the network container.

Also, as far as the the delete protection mode is considered, the lifecycle event is designed for this purpose where you actually protect the resource from unwanted changes or deletion. What you are saying about the AWS ALB or RDS is actually a feature which is inbuilt for those services but that is not the case in Atlas, which obviously can be taken as a requested feature through user voice on https://feedback.mongodb.com.

Whenever you run the terraform apply or destroy command, it refreshes the terraform state automatically and does not requires the terraform refresh to be ran manually. It then matches the configuration with the state file to carry out any changes, but in case of destroy, it will simply go ahead and try to destroy everything. In this case, I am not very sure what the script looks like and what it does, but the provisioned state is set to false automatically and nothing is required.

If both the cluster and network container are deleted together, i.e. terraform destroy is used and no -target= is defined then everything is deleted without error.

As said, I will confirm why the container is not being deleted when the target= is used for deleting the container but the probable reason is as below.

Warning: Applied changes may be incomplete
│ 
│ The plan was created with the -target option in effect, so some changes requested in the configuration may have been ignored and the output values may not be fully updated.
│ Run the following command to verify that no other changes are pending:
│     terraform plan
│ 
│ Note that the -target option is not suitable for routine use, and is provided only for exceptional situations such as recovering from errors or mistakes, or when Terraform
│ specifically suggests to use it as part of an error message.

The terraform is warning us that the use of -target= is not recommended and deleting resources as such could be problematic. And in your case the same thing is being done, you are provisioning the resource using terraform but are deleting using script which is not recommended as the resources' creation and deletion should be entirely managed using terraform and no changes should be made from anywhere else.

For any further help on this, I will recommend the support case to be logged on our support portal should you need any further assistance.

Thanks

Mar 08 '22 09:03 nikhil-mongo

@nikhil-mongo

Thanks for your investigation.

I followed your advice and created a feature request for a deletion protection flag on the MongoDBAtlas Cluster: ~~https://feedback.mongodb.com/forums/924280-database/suggestions/44885773-add-deletion-protection-feature-to-mongodbatlas-cl~~ (duplicate of https://feedback.mongodb.com/forums/924145-atlas/suggestions/39954307-cluster-termination-protection)

Mar 08 '22 14:03 leynebe

@leynebe fyi there is another user feedback for your suggestion that you may want to upvote: https://feedback.mongodb.com/forums/924145-atlas/suggestions/39954307-cluster-termination-protection.

Mar 08 '22 16:03 themantissa

Hi @nikhil-mongo, we are not using pit_enabled=true in our scripts (we are not including that option), and we still have the issue above.

We are not using

lifecycle {
    prevent_destroy = true
}

nor the --target option in our TF runs, either.

Mar 08 '22 17:03 MikiLoz92

@nikhil-mongo

I have the same issue! We can't delete the default network_container. We need to recreate that with other CIDR Block.

Could you help me?

Thanks!

Jun 08 '22 20:06 ctellechea2001

Closing issue - if any related issues come up please open a new issue or support ticket.

Oct 19 '22 20:10 themantissa

terraform-provider-mongodbatlas terraform-provider-mongodbatlas copied to clipboard

`error deleting MongoDB Network Peering Container`, even though Container exists

Terraform CLI and Terraform MongoDB Atlas Provider Version

Steps to Reproduce

Debug Output

Additional Context

terraform-provider-mongodbatlas
terraform-provider-mongodbatlas copied to clipboard