cluster-api-provider-aws
cluster-api-provider-aws copied to clipboard
`clusterctl move` not compatible with `AWSMachinePools`
/kind bug
What steps did you take and what happened:
- Spin up a bootstrap cluster in Kind
- Create a new target cluster with at least one
AWSMachinePooldefined - Wait for target cluster to be created and ready
- Perform a pivot of the cluster using
clusterctl moveso the target cluster is self-managing. - The following error will be reported:
Performing move...
Discovering Cluster API objects
Moving Cluster API objects Clusters=1
Moving Cluster API objects ClusterClasses=0
Creating objects in the target cluster
Error: [action failed after 10 attempts: error creating "infrastructure.cluster.x-k8s.io/v1beta1, Kind=AWSMachinePool" default/golem-def00a: admission webhook "validation.awsmachinepool.infrastructure.cluster.x-k8s.io" denied the request: AWSMachinePool.infrastructure.cluster.x-k8s.io "golem-def00a" is invalid: spec.awsLaunchTemplate.rootVolume.deviceName: Forbidden: root volume shouldn't have device name, action failed after 10 attempts: error creating "infrastructure.cluster.x-k8s.io/v1beta1, Kind=AWSMachinePool" default/golem-def00b: admission webhook "validation.awsmachinepool.infrastructure.cluster.x-k8s.io" denied the request: AWSMachinePool.infrastructure.cluster.x-k8s.io "golem-def00b" is invalid: spec.awsLaunchTemplate.rootVolume.deviceName: Forbidden: root volume shouldn't have device name, action failed after 10 attempts: error creating "infrastructure.cluster.x-k8s.io/v1beta1, Kind=AWSMachinePool" default/golem-def00c: admission webhook "validation.awsmachinepool.infrastructure.cluster.x-k8s.io" denied the request: AWSMachinePool.infrastructure.cluster.x-k8s.io "golem-def00c" is invalid: spec.awsLaunchTemplate.rootVolume.deviceName: Forbidden: root volume shouldn't have device name]
What did you expect to happen: All resources moved to the target cluster successfully.
Anything else you would like to add:
The rootVolume.deviceName is initially not provided when first creating the cluster resources in the bootstrap cluster. Once the AWS Launch Template has been created the details of the root volume are retrieved and the deviceName value is populated on the AWSMachinePool resource(s). When it comes to moving to the new cluster, the property remains populated and is then blocked by the admission webhook, preventing the move completing.
This value only seems to be used during the initial setup of the Launch Template and as far as I can see is never referenced by anything else after that. Manually removing the deviceName property from the AWSMachinePool resources allows the move to be performed but the value is never re-populated again as it's only fetched when initially creating the Launch Template.
Also discussed on Slack: https://kubernetes.slack.com/archives/CD6U2V71N/p1658902259480619
Environment:
- Cluster-api-provider-aws version:
v1.4.1 - Cluster-api version:
v1.1.5 - clusterctl version:
v1.2.0 - Kubernetes version: (use
kubectl version):v1.21.1 - OS (e.g. from
/etc/os-release):Ubuntu
Thanks for reporting this issue!
This is happening because the deviceName field under rootVolume section is not allowed to be non-nil during creation, but is set by the controllers. During clusterctl move, with that field set, creation fails.
For the proper fix, we need to wait for v1beta2 release, as we need webhook/field changes. But as a workaround, before the move, if users manually delete the deviceName, it won't get readded by the controllers, and move succeeds.
/triage accepted
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale