nebari icon indicating copy to clipboard operation
nebari copied to clipboard

[BUG] - AWS kubernetes resources not fully deleting properly (security group created by eks)

Open costrouc opened this issue 3 years ago β€’ 5 comments

OS system and architecture in which you are running QHub

Linux

Expected behavior

All qhub resources should cleanly delete.

[terraform]: β”‚ Error: Plugin error
[terraform]: β”‚ 
[terraform]: β”‚   with module.kubernetes-jupyterhub-ssh.kubernetes_manifest.jupyterhub-sftp-ingress,
[terraform]: β”‚   on modules/kubernetes/services/jupyterhub-ssh/main.tf line 27, in resource "kubernetes_manifest" "jupyterhub-sftp-ingress":
[terraform]: β”‚   27: resource "kubernetes_manifest" "jupyterhub-sftp-ingress" {
[terraform]: β”‚ 
[terraform]: β”‚ The plugin returned an unexpected error from
[terraform]: β”‚ plugin.(*GRPCProvider).ReadResource: rpc error: code = Unknown desc =
[terraform]: β”‚ Unauthorized
[terraform]: β•΅
[terraform]: β•·
[terraform]: β”‚ Error: Plugin error
[terraform]: β”‚ 
[terraform]: β”‚   with module.jupyterhub.kubernetes_manifest.jupyterhub,
[terraform]: β”‚   on modules/kubernetes/services/jupyterhub/main.tf line 126, in resource "kubernetes_manifest" "jupyterhub":
[terraform]: β”‚  126: resource "kubernetes_manifest" "jupyterhub" {
[terraform]: β”‚ 
[terraform]: β”‚ The plugin returned an unexpected error from
[terraform]: β”‚ plugin.(*GRPCProvider).ReadResource: rpc error: code = Unknown desc =
[terraform]: β”‚ Unauthorized
[terraform]: β•΅
[terraform]: β•·
[terraform]: β”‚ Error: Plugin error
[terraform]: β”‚ 
[terraform]: β”‚   with module.kubernetes-conda-store-server.module.minio.kubernetes_manifest.minio-api,
[terraform]: β”‚   on modules/kubernetes/services/minio/ingress.tf line 1, in resource "kubernetes_manifest" "minio-api":
[terraform]: β”‚    1: resource "kubernetes_manifest" "minio-api" {
[terraform]: β”‚ 
[terraform]: β”‚ The plugin returned an unexpected error from
[terraform]: β”‚ plugin.(*GRPCProvider).ReadResource: rpc error: code = Unknown desc =
[terraform]: β”‚ Unauthorized
[terraform]: β•΅
[terraform]: β•·
[terraform]: β”‚ Error: Plugin error
[terraform]: β”‚ 
[terraform]: β”‚   with module.monitoring[0].kubernetes_manifest.grafana-ingress-route,
[terraform]: β”‚   on modules/kubernetes/services/monitoring/main.tf line 122, in resource "kubernetes_manifest" "grafana-ingress-route":
[terraform]: β”‚  122: resource "kubernetes_manifest" "grafana-ingress-route" {
[terraform]: β”‚ 
[terraform]: β”‚ The plugin returned an unexpected error from
[terraform]: β”‚ plugin.(*GRPCProvider).ReadResource: rpc error: code = Unknown desc =
[terraform]: β”‚ Unauthorized
[terraform]: β•΅
INFO:qhub.provider.terraform:terraform init directory=stages/06-kubernetes-keycloak-configuration
INFO:qhub.provider.terraform: terraform at /tmp/terraform/1.0.5/terraform
[terraform]: 

See https://github.com/Quansight/qhub-integration-test/runs/5311056863?check_suite_focus=true#step:6:1250 for example. This is not needed for 0.4.0. But should be resolved in 0.4.1

[terraform]: module.network.aws_vpc.main: Still destroying... [id=vpc-0d4af13fa907ed7bf, 4m20s elapsed]
[terraform]: module.network.aws_vpc.main: Still destroying... [id=vpc-0d4af13fa907ed7bf, 4m30s elapsed]
[terraform]: module.network.aws_vpc.main: Still destroying... [id=vpc-0d4af13fa907ed7bf, 4m40s elapsed]
[terraform]: module.network.aws_vpc.main: Still destroying... [id=vpc-0d4af13fa907ed7bf, 4m50s elapsed]
[terraform]: β•·
[terraform]: β”‚ Error: error deleting EC2 VPC (vpc-0d4af13fa907ed7bf): DependencyViolation: The vpc 'vpc-0d4af13fa907ed7bf' has dependencies and cannot be deleted.
[terraform]: β”‚ 	status code: 400, request id: 467e3035-bbc1-400e-8880-a766392f1a9e
[terraform]: β”‚ 
[terraform]: β”‚ 
[terraform]: β•΅
INFO:qhub.provider.terraform:terraform init directory=stages/01-terraform-state/aws

Actual behavior

Resources do not all properly delete

How to Reproduce the problem?

Run qhub-integration-tests

Command output

No response

Versions and dependencies used.

No response

Compute environment

No response

Integrations

No response

Anything else?

No response

costrouc avatar Feb 23 '22 22:02 costrouc

I'm going to push this issue into 0.4.1 or later. I'll explain the rational. Currently the aws vpc does not cleanly delete with qhub destroy. There are two reasons for this.

  • when you delete the eks cluster any existing load balancers are not cleaned up https://github.com/hashicorp/terraform-provider-aws/issues/21863. We have solved this by running all the other stages and cleaning up the kubernetes service that was a load balancer. So this one is solved ... but really eks should be cleaning up after itself!
  • when you delete the eks cluster there is a stray security group that was associated with the eks cluster. Believe it is related to https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/eks_cluster#cluster_security_group_id. Related issues https://github.com/terraform-aws-modules/terraform-aws-eks/issues/1606

So this issue is a pain with no great solution on how to properly cleanup without AWS fixing this issue. Realistically this should not cause any problems aside from a stray vpc existing (no additional cost). If you want to delete the vpc simply go to the console and delete the vpc it should delete with it saying warning there is a security group still attached.

costrouc avatar Feb 24 '22 22:02 costrouc

Sounds like a plan to me.

On Thu, Feb 24, 2022 at 17:23 Christopher Ostrouchov < @.***> wrote:

I'm going to push this issue into 0.4.1 or later. I'll explain the rational. Currently the aws vpc does not cleanly delete with qhub destroy. There are two reasons for this.

  • when you delete the eks cluster any existing load balancers are not cleaned up hashicorp/terraform-provider-aws#21863 https://github.com/hashicorp/terraform-provider-aws/issues/21863. We have solved this by running all the other stages and cleaning up the kubernetes service that was a load balancer. So this one is solved ... but really eks should be cleaning up after itself!
  • when you delete the eks cluster there is a stray security group that was associated with the eks cluster. Believe it is related to https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/eks_cluster#cluster_security_group_id. Related issues terraform-aws-modules/terraform-aws-eks#1606 https://github.com/terraform-aws-modules/terraform-aws-eks/issues/1606

So this issue is a pain with no great solution on how to properly cleanup without AWS fixing this issue. Realistically this should not cause any problems aside from a stray vpc existing (no additional cost). If you want to delete the vpc simply go to the console and delete the vpc it should delete with it saying warning there is a security group still attached.

β€” Reply to this email directly, view it on GitHub https://github.com/Quansight/qhub/issues/1110#issuecomment-1050321991, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABBB6NNUA6CSYS52A67DULU42VWHANCNFSM5PFV33PQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- iPhone’d

magsol avatar Feb 24 '22 22:02 magsol

Hi @costrouc I haven't found this recently, but how odd it would be if we add an extra removal step during destroying to use boto to check if the most painful resources were deleted?

  • EKS Lb
  • Elastic filesystem
  • S3 buckets (which can be deleted very cleanly using the python cli)
  • EKS clusters and VPC -- (deleting the vpcs seems to also remove the Security groups)

viniciusdc avatar May 11 '22 17:05 viniciusdc

@costrouc @viniciusdc πŸ‘‹ I found this thread by your link-back to the terraform-aws module issue I opened.

You might want to have a look at this terraform mini module I released awhile back and have been using internally for a couple months. During the terraform destroy, the module removes these Load Balancers that are stuck because of stray ENIs (Which creates the block in deleting subnets and security groups): https://github.com/webdog/terraform-kubernetes-delete-eni

At minimum, the shell script can be taken from the module, if the terraform module doesn't make sense to use. Cheers!

webdog avatar May 13 '22 21:05 webdog

Worth trying: https://github.com/gruntwork-io/cloud-nuke

aktech avatar Feb 08 '24 16:02 aktech