cluster-api-provider-aws
cluster-api-provider-aws copied to clipboard
rework vpc cni envvar logic to drop the forced keys
What type of PR is this? /kind bug
when adding vpc cni env it forced envvars for aws-node but we now know due to a bug were never being applied. This PR drops them entirely and pushes all the onus for VPC CNI Environment Variables to the user and makes CAPA less opinionated since these env vars are being added/dropped/deprecated constantly.
In addition, this Fixes #3678 because we aren't looping/assigning the env a second time and the dedupe code above is already fixing the daemonset container env properly.
What this PR does / why we need it: Cleans up envvar assigning and makes CAPA less opinionated.
Special notes for your reviewer: I merged the env var tests into one which does the tests twice once with secondary cidr and once without.
Checklist:
- [ ] squashed commits
- [ ] includes documentation
- [ ] adds unit tests
- [ ] adds or updates e2e tests
@luthermonson: This issue is currently awaiting triage.
If CAPA/CAPI contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.
The triage/accepted label can be added by org members by writing /triage accepted in a comment.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Hi @luthermonson. Thanks for your PR.
I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/ok-to-test
Sadly, according to the AWS documentation, it is not something that you can drop.
These fields just happened to not be used by too many people I assume, because you need custom networking for that and define SecondaryCidrBlock for it. ( Which I think is not something many people would be using ).
For more detail see here: https://docs.aws.amazon.com/eks/latest/userguide/cni-custom-network.html
after talking with @Skarlso and a solutions architect we have internally i'd like to propose we still move forward with this PR and document that if you use a Secondary CIDR Block that you know what you're doing and you also set your VpcCni.Env to what you need. this makes this code simpler, accounts for dedupes and opens up the API to be more user defined as the env vars change. per what the vpc cni helm chart is doing they'll continue to use env vars to feature gate and this way CAPA just opens it up to user defined env and only reconciles based on what it's told rather than trying to infer and automatically set defaults.
This is probably okay as these values were never really applied ever. And no one complained. 😂 I refee to others in this decision.
this way CAPA just opens it up to user defined env and only reconciles based on what it's told rather than trying to infer and automatically set defaults.
@luthermonson I don't have much idea of these env vars as I have not used it myself, but as you said CAPA is setting only the defaults and not all of them, don't you think it is good to have the current implementation after @Skarlso 's changes that sets only default values? Do you still face issues after the above fix?
It's not setting the defaults as one env var value is already deprecated. I don't think capa should have an opinion of which env vars are set and the user should decide per cluster. The only one I might be willing to concede on is the var to enable custom config but I don't like not giving the user the choice.
@luthermonson There is no choice. Without it, it just doesn't work properly and might cause subtle bugs.
If CAPA doesn't set it and the user forgets about it, it might cause inconsistent behaviour.
Just saying... it never worked. And the way it's written if you changed that topology key it overwrote your decision. I disagree that capa should have an opinion on these but I'll do whatever you want.
Are we saying let's keep the enable custom cni and drop the topology env var?
I'm not saying it worked. :D I'm saying that technically it's a requirement. :) Regardless of whether this worked or not in the past. :)
I'm fine with leaving them out if there is sufficient documentation explaining the need to define these values to the cni once extra networking is enabled.
@Ankitasw To answer your question. This final part was messing with the deduplication list done a couple of steps before this. So removing the whole environment editing makes sure that deduplication works and stays like that since we don't mess with the environment values any more.
lol fair, I'd prefer to add the docs, I don't think it's a stretch to the imagination that if you are looking to do custom cni you want complete control and that includes turning it off/on on demand. Your call tho
lol fair, I'd prefer to add the docs, I don't think it's a stretch to the imagination that if you are looking to do custom cni you want complete control and that includes turning it off/on on demand. Your call tho
I'm fine with this unless someone else finds any problems with it. :)
@Skarlso updated the documentation
/lgtm
/test pull-cluster-api-provider-aws-e2e-eks
/hold until tests
/lgtm
@sedefsavas @richardcase can we get this merged?
Even though this technically didn't work it's a change in behaviour, so we should do this as part of the bump v1beta2.
Whilst i check with @sedefsavas
/hold
@Skarlso i had to rebase this should be gtg again
/lgtm
This is ok with me, so from my side:
/approve
There is a hold on this, @sedefsavas are you happy with this change? Feel free to ping me
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: richardcase
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [richardcase]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
/milestone v2.0.0
@richardcase @Skarlso i thinks folks are ok to merge this, shall we go ahead?
Yep I think it's good to go.
/test pull-cluster-api-provider-aws-e2e-blocking