cluster-api-provider-aws icon indicating copy to clipboard operation
cluster-api-provider-aws copied to clipboard

rework vpc cni envvar logic to drop the forced keys

Open luthermonson opened this issue 3 years ago • 25 comments
trafficstars

What type of PR is this? /kind bug

when adding vpc cni env it forced envvars for aws-node but we now know due to a bug were never being applied. This PR drops them entirely and pushes all the onus for VPC CNI Environment Variables to the user and makes CAPA less opinionated since these env vars are being added/dropped/deprecated constantly.

In addition, this Fixes #3678 because we aren't looping/assigning the env a second time and the dedupe code above is already fixing the daemonset container env properly.

What this PR does / why we need it: Cleans up envvar assigning and makes CAPA less opinionated.

Special notes for your reviewer: I merged the env var tests into one which does the tests twice once with secondary cidr and once without.

Checklist:

  • [ ] squashed commits
  • [ ] includes documentation
  • [ ] adds unit tests
  • [ ] adds or updates e2e tests

luthermonson avatar Aug 20 '22 00:08 luthermonson

@luthermonson: This issue is currently awaiting triage.

If CAPA/CAPI contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Aug 20 '22 00:08 k8s-ci-robot

Hi @luthermonson. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Aug 20 '22 00:08 k8s-ci-robot

/ok-to-test

Ankitasw avatar Aug 22 '22 10:08 Ankitasw

Sadly, according to the AWS documentation, it is not something that you can drop.

These fields just happened to not be used by too many people I assume, because you need custom networking for that and define SecondaryCidrBlock for it. ( Which I think is not something many people would be using ).

For more detail see here: https://docs.aws.amazon.com/eks/latest/userguide/cni-custom-network.html

Skarlso avatar Aug 22 '22 10:08 Skarlso

after talking with @Skarlso and a solutions architect we have internally i'd like to propose we still move forward with this PR and document that if you use a Secondary CIDR Block that you know what you're doing and you also set your VpcCni.Env to what you need. this makes this code simpler, accounts for dedupes and opens up the API to be more user defined as the env vars change. per what the vpc cni helm chart is doing they'll continue to use env vars to feature gate and this way CAPA just opens it up to user defined env and only reconciles based on what it's told rather than trying to infer and automatically set defaults.

luthermonson avatar Aug 22 '22 21:08 luthermonson

This is probably okay as these values were never really applied ever. And no one complained. 😂 I refee to others in this decision.

Skarlso avatar Aug 23 '22 04:08 Skarlso

this way CAPA just opens it up to user defined env and only reconciles based on what it's told rather than trying to infer and automatically set defaults.

@luthermonson I don't have much idea of these env vars as I have not used it myself, but as you said CAPA is setting only the defaults and not all of them, don't you think it is good to have the current implementation after @Skarlso 's changes that sets only default values? Do you still face issues after the above fix?

Ankitasw avatar Aug 23 '22 09:08 Ankitasw

It's not setting the defaults as one env var value is already deprecated. I don't think capa should have an opinion of which env vars are set and the user should decide per cluster. The only one I might be willing to concede on is the var to enable custom config but I don't like not giving the user the choice.

luthermonson avatar Aug 23 '22 14:08 luthermonson

@luthermonson There is no choice. Without it, it just doesn't work properly and might cause subtle bugs.

If CAPA doesn't set it and the user forgets about it, it might cause inconsistent behaviour.

Skarlso avatar Aug 23 '22 14:08 Skarlso

Just saying... it never worked. And the way it's written if you changed that topology key it overwrote your decision. I disagree that capa should have an opinion on these but I'll do whatever you want.

Are we saying let's keep the enable custom cni and drop the topology env var?

luthermonson avatar Aug 23 '22 14:08 luthermonson

I'm not saying it worked. :D I'm saying that technically it's a requirement. :) Regardless of whether this worked or not in the past. :)

I'm fine with leaving them out if there is sufficient documentation explaining the need to define these values to the cni once extra networking is enabled.

Skarlso avatar Aug 23 '22 14:08 Skarlso

@Ankitasw To answer your question. This final part was messing with the deduplication list done a couple of steps before this. So removing the whole environment editing makes sure that deduplication works and stays like that since we don't mess with the environment values any more.

Skarlso avatar Aug 23 '22 14:08 Skarlso

lol fair, I'd prefer to add the docs, I don't think it's a stretch to the imagination that if you are looking to do custom cni you want complete control and that includes turning it off/on on demand. Your call tho

luthermonson avatar Aug 23 '22 14:08 luthermonson

lol fair, I'd prefer to add the docs, I don't think it's a stretch to the imagination that if you are looking to do custom cni you want complete control and that includes turning it off/on on demand. Your call tho

I'm fine with this unless someone else finds any problems with it. :)

Skarlso avatar Aug 23 '22 14:08 Skarlso

@Skarlso updated the documentation

luthermonson avatar Aug 23 '22 19:08 luthermonson

/lgtm

Skarlso avatar Aug 23 '22 21:08 Skarlso

/test pull-cluster-api-provider-aws-e2e-eks

Skarlso avatar Aug 23 '22 21:08 Skarlso

/hold until tests

Skarlso avatar Aug 23 '22 21:08 Skarlso

/lgtm

Skarlso avatar Aug 24 '22 14:08 Skarlso

@sedefsavas @richardcase can we get this merged?

luthermonson avatar Aug 25 '22 15:08 luthermonson

Even though this technically didn't work it's a change in behaviour, so we should do this as part of the bump v1beta2.

Whilst i check with @sedefsavas

/hold

richardcase avatar Sep 02 '22 10:09 richardcase

@Skarlso i had to rebase this should be gtg again

luthermonson avatar Sep 28 '22 05:09 luthermonson

/lgtm

Skarlso avatar Sep 28 '22 05:09 Skarlso

This is ok with me, so from my side:

/approve

There is a hold on this, @sedefsavas are you happy with this change? Feel free to ping me

richardcase avatar Oct 10 '22 15:10 richardcase

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: richardcase

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Oct 10 '22 15:10 k8s-ci-robot

/milestone v2.0.0

richardcase avatar Oct 25 '22 14:10 richardcase

@richardcase @Skarlso i thinks folks are ok to merge this, shall we go ahead?

Ankitasw avatar Oct 28 '22 08:10 Ankitasw

Yep I think it's good to go.

Skarlso avatar Oct 28 '22 09:10 Skarlso

/test pull-cluster-api-provider-aws-e2e-blocking

Ankitasw avatar Oct 28 '22 09:10 Ankitasw