cluster-api-provider-aws Resource garbage collection feature should work in a private Subnet

/kind feature

Describe the solution you'd like [A clear and concise description of what you want to happen.] I was trying the new resource garbage collection feature and it worked great for both AWS and EKS clusters 🎉

But then I tried with a bootstrap controller in a private Subnet that uses VPC Endpoints to access AWS services when managing clusters.

The cluster fails to delete with:

I0812 17:12:49.081842       1 awscluster_controller.go:209] controller/awscluster "msg"="Reconciling AWSCluster delete" "cluster"="e2e-aws-air-gapped-test-3568383" "name"="e2e-aws-air-gapped-test-3568383" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSCluster" 
I0812 17:16:50.269815       1 cleanup.go:43] controller/awscluster "msg"="reconciling deletion for garbage collection" "cluster"="e2e-aws-air-gapped-test-3568383" "name"="e2e-aws-air-gapped-test-3568383" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSCluster" 
I0812 17:16:50.269848       1 cleanup.go:65] controller/awscluster "msg"="deleting aws resources created by tenant cluster" "cluster"="e2e-aws-air-gapped-test-3568383" "name"="e2e-aws-air-gapped-test-3568383" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSCluster" 
E0812 17:18:50.659052       1 controller.go:317] controller/awscluster "msg"="Reconciler error" "error"="failed delete reconcile for gc service: getting tagged resources: RequestError: send request failed\ncaused by: Post \"https://tagging.us-west-2.amazonaws.com/\": dial tcp 54.240.253.156:443: i/o timeout" "name"="e2e-aws-air-gapped-test-3568383" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSCluster"

Unfortunately there is no tagging VPC Endpoint from what I can tell (I've also searched in the AWS console and couldn't find anything related to "tag" or "tagging")

Error: creating EC2 VPC Endpoint (com.amazonaws.us-west-2.tagging): InvalidServiceName: The Vpc Endpoint Service 'com.amazonaws.us-west-2.tagging' does not exist

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Here are the rest of the VPC Endpoints that we create in these private Subnet environments, "com.amazonaws.us-west-2.tagging" is not a valid service.

resource "aws_vpc_endpoint" "vpc_endpoints" {
  for_each = toset( [
    "com.amazonaws.us-west-2.ec2",
    "com.amazonaws.us-west-2.elasticloadbalancing",
    "com.amazonaws.us-west-2.autoscaling",
    "com.amazonaws.us-west-2.secretsmanager",
    "com.amazonaws.us-west-2.ssm",
    "com.amazonaws.us-west-2.ssmmessages",
    "com.amazonaws.us-west-2.ec2messages",
    "com.amazonaws.us-west-2.tagging",
  ] )

  vpc_id            = aws_vpc.my_vpc.id
  service_name      = each.key
  vpc_endpoint_type = "Interface"
  security_group_ids = [aws_security_group.my_private.id]
  // the bastion machine will be accessing this endpoint
  subnet_ids = [aws_subnet.my_public.id]
  private_dns_enabled = true

  tags = var.tags
}

Environment:

Cluster-api-provider-aws version:
Kubernetes version: (use kubectl version):
OS (e.g. from /etc/os-release):

Aug 12 '22 18:08 dkoshkin

/assign

Aug 13 '22 05:08 richardcase

/triage accepted

Aug 13 '22 05:08 richardcase

https://docs.aws.amazon.com/vpc/latest/privatelink/aws-services-privatelink-support.html lists the AWS services that can be accessed by a VPC endpoint. If the resource tagging service is not on this list, what are our options?

(Also, for reference, this is the list of public endpoints for the tagging service: https://docs.aws.amazon.com/general/latest/gr/arg.html#argtapi)

Aug 15 '22 15:08 dlipovetsky

https://docs.aws.amazon.com/vpc/latest/privatelink/aws-services-privatelink-support.html lists the AWS services that can be accessed by a VPC endpoint.

I've asked AWS about this and I'll update when I hear back.

Sep 12 '22 18:09 dlipovetsky

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Dec 11 '22 19:12 k8s-triage-robot

I've not had any time to work on this so:

/unassign /help /priority important-longterm

Dec 12 '22 15:12 richardcase

@richardcase: This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

Why are we solving this issue?
To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
Does this issue have zero to low barrier of entry?
How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to this:

I've not had any time to work on this so:

/unassign /help /priority important-longterm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Dec 12 '22 15:12 k8s-ci-robot

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

Jan 19 '24 20:01 k8s-triage-robot

cluster-api-provider-aws cluster-api-provider-aws copied to clipboard

Resource garbage collection feature should work in a private Subnet

Guidelines

cluster-api-provider-aws
cluster-api-provider-aws copied to clipboard