cluster-api-provider-aws
cluster-api-provider-aws copied to clipboard
Regression - unable to create worker pool without specifying subnet filters
/kind bug
What steps did you take and what happened:
I used the following template to create a cluster:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSCluster
metadata:
name: test-capi-oz
namespace: default
spec:
region: us-east-2
network:
vpc:
cidrBlock: 10.50.0.0/16
subnets:
- availabilityZone: us-east-2a
cidrBlock: 10.50.0.0/20
isPublic: true
tags:
test-capi-oz: us-east-2a
....
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachinePool
metadata:
name: test-capi-oz-mp-0
namespace: default
spec:
clusterName: test-capi-oz
failureDomains:
- "us-east-2a"
- "us-east-2b"
- "us-east-2c"
replicas: 3
template:
spec:
bootstrap:
configRef:
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfig
name: test-capi-oz-mp-0
clusterName: test-capi-oz
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSMachinePool
name: test-capi-oz-mp-0
version: v1.24.0
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSMachinePool
metadata:
name: test-capi-oz-mp-0
namespace: default
spec:
availabilityZones:
- us-east-2
awsLaunchTemplate:
iamInstanceProfile: nodes.cluster-api-provider-aws.sigs.k8s.io
instanceType: t3.large
sshKeyName: oznt
maxSize: 4
minSize: 3
The master node is started, and no worker node is launched. The logs show the following error repeats:
E1214 21:34:52.883240 1 controller.go:317] controller/awsmachinepool "msg"="Reconciler error" "error"="failed to create AWSMachinePool: getting subnets for ASG: getting subnets for spec azs: getting subnets for availability zone us-east-2: no subnets found for supplied availability zone" "name"="test-capi-oz-mp-0" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSMachinePool"
What did you expect to happen:
I expected 3 worker nodes to launch in each failure domain. However, no worker node is started at all.
Anything else you would like to add:
I believe that core issue is because I didn't specify subnets. I used the default template generated following the instructions here:
export EXP_MACHINE_POOL=true
clusterctl init --infrastructure aws
clusterctl generate cluster my-cluster --kubernetes-version v1.24.0 --flavor machinepool > my-cluster.yaml
I believe the issue is caused by the removal of reading subnets from the cluster spec:

Taken from pkg/cloud/services/autoscaling/autoscalinggroup.go in https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/3255/files.
As a workaround, I added the following to my cluster and template specs:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSCluster
metadata:
name: test-capi-oz
namespace: default
spec:
region: us-east-2
network:
vpc:
cidrBlock: 10.50.0.0/16
subnets:
- availabilityZone: us-east-2a
cidrBlock: 10.50.0.0/20
isPublic: true
tags:
test-capi-oz: us-east-2a
- availabilityZone: us-east-2a
cidrBlock: 10.50.16.0/20
tags:
test-capi-oz: us-east-2a
- availabilityZone: us-east-2b
cidrBlock: 10.50.32.0/20
isPublic: true
tags:
test-capi-oz: us-east-2b
- availabilityZone: us-east-2b
cidrBlock: 10.50.48.0/20
tags:
test-capi-oz: us-east-2b
- availabilityZone: us-east-2c
cidrBlock: 10.50.64.0/20
isPublic: true
tags:
test-capi-oz: us-east-2c
- availabilityZone: us-east-2c
cidrBlock: 10.50.80.0/20
tags:
test-capi-oz: us-east-2c
sshKeyName: oznt
and
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSMachinePool
metadata:
name: test-capi-oz-mp-0
namespace: default
spec:
availabilityZones:
- us-east-2
awsLaunchTemplate:
iamInstanceProfile: nodes.cluster-api-provider-aws.sigs.k8s.io
instanceType: t3.large
sshKeyName: oznt
maxSize: 4
minSize: 1
subnets:
- id: "subnet-05697af9a2aed1d9e"
#- filters: []
- filters:
- name: "test-capi-oz"
values:
- "us-east-2a"
- "us-east-2b"
- "us-east-2c"
This required gathering the pieces from reading the code and the docs https://cluster-api-aws.sigs.k8s.io/topics/failure-domains/control-planes.html?highlight=cidrBlo#using-failuredomain-in-network-object-of-awsmachine
I see a few ways to fix this issue:
First, better documentation, explaining that one must add subnets to the cluster spec and AWSMachinePool. Explicitly show an example like my templates would help people following my path. Second, adding subnets to the default template. Finally, it should be considered returning the following block:
if len(subnetIDs) == 0 {
for _, subnet := range scope.InfraCluster.Subnets() {
subnetIDs = append(subnetIDs, subnet.ID)
}
}
Before calling
subnetIDs, err := s.SubnetIDs(scope)
(Alternatively, this could be added at the top of s.SubnetID).
I would be happy to contribute a PR above suggestions added.
Environment:
- Cluster-api-provider-aws version: registry.k8s.io/cluster-api-aws/cluster-api-aws-controller:v1.5.2
- Kubernetes version: (use
kubectl version):
$ k version -o yaml
clientVersion:
buildDate: "2022-11-25T08:23:01Z"
compiler: gc
gitCommit: 434bfd82814af038ad94d62ebe59b133fcb50506
gitTreeState: archive
gitVersion: v1.25.3
goVersion: go1.19.2
major: "1"
minor: "25"
platform: linux/amd64
kustomizeVersion: v4.5.7
serverVersion:
buildDate: "2022-10-25T19:35:11Z"
compiler: gc
gitCommit: 434bfd82814af038ad94d62ebe59b133fcb50506
gitTreeState: clean
gitVersion: v1.25.3
goVersion: go1.19.2
major: "1"
minor: "25"
platform: linux/amd64
- OS (e.g. from
/etc/os-release): gentoo
/triage accepted
@oz123 please go ahead with a PR for this if you are interested.
This issue has not been updated in over 1 year, and should be re-triaged.
You can:
- Confirm that this issue is still relevant with
/triage accepted(org members only) - Close this issue with
/close
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/
/remove-triage accepted
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue with
/reopen - Mark this issue as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue with
/reopen- Mark this issue as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.