cluster-api-provider-aws
cluster-api-provider-aws copied to clipboard
Resource garbage collection feature should work in a private Subnet
/kind feature
Describe the solution you'd like [A clear and concise description of what you want to happen.] I was trying the new resource garbage collection feature and it worked great for both AWS and EKS clusters 🎉
But then I tried with a bootstrap controller in a private Subnet that uses VPC Endpoints to access AWS services when managing clusters.
The cluster fails to delete with:
I0812 17:12:49.081842 1 awscluster_controller.go:209] controller/awscluster "msg"="Reconciling AWSCluster delete" "cluster"="e2e-aws-air-gapped-test-3568383" "name"="e2e-aws-air-gapped-test-3568383" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSCluster"
I0812 17:16:50.269815 1 cleanup.go:43] controller/awscluster "msg"="reconciling deletion for garbage collection" "cluster"="e2e-aws-air-gapped-test-3568383" "name"="e2e-aws-air-gapped-test-3568383" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSCluster"
I0812 17:16:50.269848 1 cleanup.go:65] controller/awscluster "msg"="deleting aws resources created by tenant cluster" "cluster"="e2e-aws-air-gapped-test-3568383" "name"="e2e-aws-air-gapped-test-3568383" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSCluster"
E0812 17:18:50.659052 1 controller.go:317] controller/awscluster "msg"="Reconciler error" "error"="failed delete reconcile for gc service: getting tagged resources: RequestError: send request failed\ncaused by: Post \"https://tagging.us-west-2.amazonaws.com/\": dial tcp 54.240.253.156:443: i/o timeout" "name"="e2e-aws-air-gapped-test-3568383" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSCluster"
Unfortunately there is no tagging
VPC Endpoint from what I can tell (I've also searched in the AWS console and couldn't find anything related to "tag" or "tagging")
Error: creating EC2 VPC Endpoint (com.amazonaws.us-west-2.tagging): InvalidServiceName: The Vpc Endpoint Service 'com.amazonaws.us-west-2.tagging' does not exist
Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]
Here are the rest of the VPC Endpoints that we create in these private Subnet environments, "com.amazonaws.us-west-2.tagging"
is not a valid service.
resource "aws_vpc_endpoint" "vpc_endpoints" {
for_each = toset( [
"com.amazonaws.us-west-2.ec2",
"com.amazonaws.us-west-2.elasticloadbalancing",
"com.amazonaws.us-west-2.autoscaling",
"com.amazonaws.us-west-2.secretsmanager",
"com.amazonaws.us-west-2.ssm",
"com.amazonaws.us-west-2.ssmmessages",
"com.amazonaws.us-west-2.ec2messages",
"com.amazonaws.us-west-2.tagging",
] )
vpc_id = aws_vpc.my_vpc.id
service_name = each.key
vpc_endpoint_type = "Interface"
security_group_ids = [aws_security_group.my_private.id]
// the bastion machine will be accessing this endpoint
subnet_ids = [aws_subnet.my_public.id]
private_dns_enabled = true
tags = var.tags
}
Environment:
- Cluster-api-provider-aws version:
- Kubernetes version: (use
kubectl version
): - OS (e.g. from
/etc/os-release
):
/assign
/triage accepted
https://docs.aws.amazon.com/vpc/latest/privatelink/aws-services-privatelink-support.html lists the AWS services that can be accessed by a VPC endpoint. If the resource tagging service is not on this list, what are our options?
(Also, for reference, this is the list of public endpoints for the tagging service: https://docs.aws.amazon.com/general/latest/gr/arg.html#argtapi)
https://docs.aws.amazon.com/vpc/latest/privatelink/aws-services-privatelink-support.html lists the AWS services that can be accessed by a VPC endpoint.
I've asked AWS about this and I'll update when I hear back.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
I've not had any time to work on this so:
/unassign /help /priority important-longterm
@richardcase: This request has been marked as needing help from a contributor.
Guidelines
Please ensure that the issue body includes answers to the following questions:
- Why are we solving this issue?
- To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
- Does this issue have zero to low barrier of entry?
- How can the assignee reach out to you for help?
For more details on the requirements of such an issue, please see here and ensure that they are met.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help
command.
In response to this:
I've not had any time to work on this so:
/unassign /help /priority important-longterm
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
This issue has not been updated in over 1 year, and should be re-triaged.
You can:
- Confirm that this issue is still relevant with
/triage accepted
(org members only) - Close this issue with
/close
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/
/remove-triage accepted