cluster-api-provider-aws
cluster-api-provider-aws copied to clipboard
✨ Add AWSMachines to back the ec2 instances in AWSMachinePools
What type of PR is this? /kind feature
What this PR does / why we need it: Implements the MachinePool Machines clusterAPI proposal
Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #4184
Special notes for your reviewer:
Checklist:
- [X] squashed commits
- [ ] includes documentation
- [x] adds unit tests
- [x] adds or updates e2e tests
Release note:
Added AWSMachines to back the ec2 instances in AWSMachinePools and AWSManagedMachinePools.
Hi @cnmcavoy. Thanks for your PR.
I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
~Still missing new tests to cover the functionality, have been testing locally with tilt~
Edit: tests have been added.
/ok-to-test
/assign
@cnmcavoy is there a final review pending for this PR, or any other items?
@Ankitasw I think all the questions have been answered, it probably needs a new reviewer to give it a look over for a lgtm. It's ready to be merged if there are no more review items to address.
I can have a look into this, @cnmcavoy – do you want to resolve conflicts first?
I can have a look into this, @cnmcavoy – do you want to resolve conflicts first?
Sure, I think I got them all sorted.
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closed
You can:
- Mark this PR as fresh with
/remove-lifecycle stale - Close this PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign richardcase for approval. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
As an update: I'll try to test this major feature. PR looks mostly fine, I think. I'm only unsure about the owner reference in the non-happy case.
I got a working test environment, and the AWSMachine/Machine objects get created correctly 🍀.
First blocking issue I found: deletion by downscaling the ASG/AWSMachinePool leads to this permanent error rather than deleting the AWSMachine:
failureMessage: EC2 instance state "terminated" is unexpected
failureReason: UpdateError
instanceState: terminated
ready: false
The Machine/AWSMachine combo also can't be deleted due to:
capa_control… │ I0822 16:04:49.507270 1 awsmachine_controller.go:303] "Handling deleted AWSMachine"
capa_control… │ E0822 16:04:49.508449 1 awsmachine_controller.go:308] "unable to delete machine" err="failed to get raw userdata: error retrieving bootstrap data: linked Machine's bootstrap.dataSecretName is nil"
Other than that, the Node object went away and the cluster reacted according to the node deletion. @cnmcavoy what happened when you tested this? And which other cases do you think should be tested? I guess cluster-autoscaler scaling of ASGs may be a good external trigger that CAPA should have no problem with – I can test that.
@AndiDog thanks for taking a look. I think that's slightly concerning bc there should be test cases covering that sort of condition.
I suspect this PR has been left too long and the code has rotted. Indeed doesn't use ASGs anymore for autoscaling, so I can't really justify the time investment it would take to cut a new PR and start this fresh.