terraform-provider-vsphere icon indicating copy to clipboard operation
terraform-provider-vsphere copied to clipboard

`vsphere_vm_storage_policy` not deleted after `terraform destroy`

Open Calvinaud opened this issue 4 years ago • 4 comments
trafficstars

Terraform Version

v1.0.0

vSphere Provider Version

v2.0.1

Affected Resource(s)

  • vsphere_vm_storage_policy

Terraform Configuration Files

provider "vsphere" {
  user           = var.vsphere_user
  password       = var.vsphere_password
  vsphere_server = var.vsphere_server
  allow_unverified_ssl = true
}

data "vsphere_tag_category" "policy_category" {
  name = var.storage_policy_tag_category
}

data "vsphere_tag" "policy_tag_include" {
  name        = var.policy_tag
  category_id = data.vsphere_tag_category.policy_category.id
}

resource "vsphere_vm_storage_policy" "policy_tag_based_placement" {
  name        = "kube_test"
  description = "This storage policy is managed by Terraform. It's used for the vSphere CSI StorageClass (in Kubernetes) for Persistent Volumes"

  tag_rules {
    tag_category                 = data.vsphere_tag_category.policy_category.name
    tags                         = [ data.vsphere_tag.policy_tag_include.name ]
    include_datastores_with_tags = true
  }
}

Debug Output

https://gist.github.com/Calvinaud/2ab056d3f8a102b585b403ee419b8450

Panic Output

No panic Output

Expected Behavior

The terraform destroy should panic since the storage policy is still used (or delete it). When you try to remove the policy manually in vCenter you have the error message:

Delete VM Storage Policy failed!

The resource 'xxxx-xxx-...' is in use.

Actual Behavior

The terraform destroy execute without errors but do not delete the storage policy. It also remove the Storage Policy in the tfstate even if the resource still exist.

Steps to Reproduce

  1. terraform apply to create the storage policy
  2. In a Kubernetes cluster install vSphere CPI and vSphere CSI
  3. Create a StorageClass using the storage policy in k8s
  4. Create a PVC in Kubernetes with the StorageClass
  5. terraform destroy to try destroying the storage policy

(Also possible to reproduce without Kubernetes by creating a resource not managed by terraform that use the storage policy created)

Important Factoids

Vsphere CSI version: v2.1.1 Kubernetes version: 1.19.7 and 1.20.7 (I was able to reproduce in both version) vSphere version: 6.7u3

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Calvinaud avatar Jun 22 '21 07:06 Calvinaud

Hi @Calvinaud, I suspect that this would be the expected behavior of the provider.

If you are running a destroy on the plan you would be requesting to remove the resource from vSphere (if possible) and therefore remove the resource from the state. In this instance, since the storage policy is in use by another object (perhaps outside of this plan, in another plan, or other) the removal of the remaining resources proceeds along with the removal of the policy from the state to reach the desired state. However, it may be more ideal to log a similar warning in Terraform if the object is in use by external entities.

@appilon, could you confirm if this is the expected behavior?

Ryan

tenthirtyam avatar Jan 25 '22 21:01 tenthirtyam

Naively, I would expect the provider to error if it was unable to delete something upstream (would expect to get an error from vsphere API/govmomi) and have that bubble up on destroy, and yes, it would then not be a successful destroy/would remain in state. I will have to look into the resource code.

However, @Calvinaud , creating something with terraform and then attaching it to something out of band that prevents terraform to have authority over it going forward isn't an ideal practice. I suspect if I did fix a silent failure, what will happen is you will have to delete the object out of terraform's scope and then run terraform destroy again, and possibly have to run terraform state rm to manually remove from state (depending on the resource code, which I need to look into). Worth a follow up.

appilon avatar Jan 25 '22 22:01 appilon

Hi @appilon and @tenthirtyam, thanks for the response.

I agree on the fact that Terraform can loose authority on a resource it's not ideal. But in our cases the binding is done by something way out of scope from Terraform/the infrastructure and we can do nothing about it. (or if you have a idea to get around it I will gladly take it but it's not the subject of this issue).

That why, in our procedure, we normally need to uninstall the Kubernetes cluster before running the terraform destroy.

The main problem is the possible divergence between the reality and the state. The most blocking problem in our use cases is when we try to recreate the infrastructure. It crashes because he tries to recreate a resource that already exist.

A question by pure curiosity: Do you think it should panic during the planning phase of the destroy or still destroy the other resources and just keep this one in the state?

Have a nice day

Calvinaud avatar Jan 26 '22 10:01 Calvinaud

re:

Do you think it should panic during the planning phase of the destroy or still destroy the other resources and just keep this one in the state?

I agree with Alex's points here in the thread.

It certainly should not panic during a terraform plan. It should certainly destroy what it's able to control based on state and it the resource is not used by another resource (one denied by Terraform or natively in vSphere). In an ideal state, it would be great if the plan could detect the resource being in use by "other forces", and provide the option to continue, but removed only from state.

Ryan

tenthirtyam avatar Feb 17 '22 20:02 tenthirtyam

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

github-actions[bot] avatar Apr 28 '23 02:04 github-actions[bot]