cluster-api-provider-aws
cluster-api-provider-aws copied to clipboard
Add ipv6 support
TODOs
- [x] Add missing things like:
- [x] Routing
- [x] Subnet definitions
- [x] ENI support
- [x] ... TBD
- [x] Configure the aws-node vpc-cni addon to use environment property
ENABLE_IPv6
withENABLE_PREFIX_DELEGATION
set to true
- [x] Add documentations
- [x] Lots of validation
- [x] Add e2e tests
- [x] Add unit tests
- [x] Refactor
What type of PR is this?
/kind feature
What this PR does / why we need it:
Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):
Part 1 for https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/2420.
We are aiming to do as less as possible so the PR doesn't get too much bloated.
Joined PR with @nikimanoledaki
Special notes for your reviewer:
This will be Part 1 for IPv6 support. What does Part 1 contain you might ask?
- create an ipv6 cluster
- refactor the code to offer both options without needing to use a lot of
if
s - validations for things that ipv6 doesn't support and capa wouldn't support in the beginning
E2E test run for EKS cluster:
Ran 1 of 3 Specs in 1488.316 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 2 Skipped
Checklist:
- [x] squashed commits
- [x] includes documentation
- [x] adds unit tests
- [x] adds or updates e2e tests
@Skarlso: This issue is currently awaiting triage.
If CAPA/CAPI contributors determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Hi @Skarlso. Thanks for your PR.
I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test
on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test
label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/assign Skarlso /assign nikimanoledaki
@Skarlso: GitHub didn't allow me to assign the following users: nikimanoledaki.
Note that only kubernetes-sigs members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide
In response to this:
/assign Skarlso /assign nikimanoledaki
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/ok-to-test
As a note it's probably not worth wasting run time on this yet as I'm sure I messed up a bunch of tests 😅

Starting to get there. :)
Working subnet creation

Successfully created Ipv6 cluster:
I0617 07:01:35.730194 606 eks.go:34] "msg"="Reconciling EKS control plane" "cluster-name"="ipv6-cluster" "cluster-namespace"="default"
I0617 07:01:35.730243 606 roles.go:56] "msg"="Reconciling EKS Control Plane IAM Role"
I0617 07:01:35.730254 606 roles.go:60] "msg"="no eks control plane role specified, using default eks control plane role"
I0617 07:01:35.730262 606 roles.go:67] "msg"="using eks control plane role" "role-name"="eks-controlplane.cluster-api-provider-aws.sigs.k8s.io"
I0617 07:01:36.587591 606 roles.go:90] "msg"="Skipping, EKS control plane role policy assignment as role is unamanged"
I0617 07:01:36.587657 606 cluster.go:47] "msg"="Reconciling EKS cluster"
I0617 07:01:39.233322 606 cluster.go:413] "msg"="Created EKS cluster in AWS" "cluster-name"="default_ipv6-cluster-control-plane"
Now, testing pod access, machines, communication, etc... :D

First IPv6 enabled WORKING CAPA Cluster y'all
k get pods -A --kubeconfig test.kubeconfig -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system aws-node-rp54r 1/1 Running 1 29m 2a05:d014:16d:7202:dff9:604c:cead:f657 ip-10-0-97-100.eu-central-1.compute.internal <none> <none>
kube-system coredns-78666889b9-fsv9m 1/1 Running 0 37m 2a05:d014:16d:7202:560d:: ip-10-0-97-100.eu-central-1.compute.internal <none> <none>
kube-system coredns-78666889b9-r62dn 1/1 Running 0 37m 2a05:d014:16d:7202:560d::1 ip-10-0-97-100.eu-central-1.compute.internal <none> <none>
kube-system kube-proxy-tcc25 1/1 Running 0 29m 2a05:d014:16d:7202:dff9:604c:cead:f657 ip-10-0-97-100.eu-central-1.compute.internal <none> <none>
+10,073 −178
About 9000 lines of generated IAM mock. :D
Let's see if we didn't break anything so far.
/retest
/retest

Hmm... The fuzzing passes locally.
/retest
/retest
This fuzzy test thing is very :suspect:
--- FAIL: TestFuzzyConversion (37.63s)
--- FAIL: TestFuzzyConversion/for_AWSManagedControlPlane (37.63s)
--- PASS: TestFuzzyConversion/for_AWSManagedControlPlane/spoke-hub-spoke (37.62s)
--- FAIL: TestFuzzyConversion/for_AWSManagedControlPlane/hub-spoke-hub (0.01s)
FAIL
Passes locally.
I was looking at the wrong suite. 🤦
I ran it locally a couple of dozen times and it didn't fail. Even though it's clearly failing on the newly added values. 🤔
/retest
/retest
Nice!! Now that we know nothing existing is breaking ( as far as tests go :D ) the real work can begin of extending unit tests, adding e2e tests and adding documentation and validation. :)
/retest
Things are still working after refactorings and small fixes:
clusterctl describe cluster ipv6-cluster
NAME READY SEVERITY REASON SINCE MESSAGE
Cluster/ipv6-cluster True 10m
├─ControlPlane - AWSManagedControlPlane/ipv6-cluster-control-plane True 10m
└─Workers
└─MachineDeployment/ipv6-cluster-md-0 True 79s
└─Machine/ipv6-cluster-md-0-64d8d795ff-58sm8 True 4m31s
Nope... Something is not working... CoreDNS isn't working.
Found it. Missing IAM permission.
ipv6 k get pods -A --kubeconfig test.kubeconfig -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system aws-node-9jhx8 1/1 Running 1 (118s ago) 3m57s 2600:1f13:8a4:5102:db80:66c3:be32:ab6d ip-10-0-95-89.us-west-2.compute.internal <none> <none>
kube-system coredns-657694c6f4-g7t2f 1/1 Running 0 12m 2600:1f13:8a4:5102:979a::1 ip-10-0-95-89.us-west-2.compute.internal <none> <none>
kube-system coredns-657694c6f4-rwx2m 1/1 Running 0 12m 2600:1f13:8a4:5102:979a:: ip-10-0-95-89.us-west-2.compute.internal <none> <none>
kube-system kube-proxy-pzcng 1/1 Running 0 3m57s 2600:1f13:8a4:5102:db80:66c3:be32:ab6d ip-10-0-95-89.us-west-2.compute.internal <none> <none>
Things are working 🎉
NOTE: I will be on vacation for two weeks so expect some downtime on this. :) I have not abandoned it, I'm just taking a bit of time off.
Things that will be still added:
- Documentation
- E2E test
- Validations
- Lots of manual testing for security groups and communications
Not much left now. :)
Nice, the new code skips updating ipv6 environment properties on vpc-cni if it's defined and set to true already. Skipping an update
to the resource.
I0625 16:22:27.906893 9 cni.go:182] "msg"="ipv6 environment properties already set, skipping update of vpc-cni addon"
First, trivial test.
Pod to pod ping on same node.
bash-4.2# ping6 2600:1f13:859:ec02:545c::1
PING 2600:1f13:859:ec02:545c::1(2600:1f13:859:ec02:545c::1) 56 data bytes
64 bytes from 2600:1f13:859:ec02:545c::1: icmp_seq=1 ttl=64 time=0.047 ms
64 bytes from 2600:1f13:859:ec02:545c::1: icmp_seq=2 ttl=64 time=0.052 ms
64 bytes from 2600:1f13:859:ec02:545c::1: icmp_seq=3 ttl=64 time=0.049 ms
64 bytes from 2600:1f13:859:ec02:545c::1: icmp_seq=4 ttl=64 time=0.042 ms
Niceee. 🟢
Let's create a couple of pods and see IP assignment with inflate
.
Nice, looking good:
➜ ipv6 k get pods -A --kubeconfig test.kubeconfig -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
default inflate-866ccdf4c8-7lm65 1/1 Running 0 22s 2600:1f13:859:ec02:545c::8 ip-10-0-93-63.us-west-2.compute.internal <none> <none>
default inflate-866ccdf4c8-ghsmq 1/1 Running 0 22s 2600:1f13:859:ec02:545c::4 ip-10-0-93-63.us-west-2.compute.internal <none> <none>
default inflate-866ccdf4c8-j7x2s 1/1 Running 0 22s 2600:1f13:859:ec02:545c::6 ip-10-0-93-63.us-west-2.compute.internal <none> <none>
default inflate-866ccdf4c8-jsngz 0/1 Pending 0 22s <none> <none> <none> <none>
default inflate-866ccdf4c8-jvnhn 0/1 Pending 0 22s <none> <none> <none> <none>
default inflate-866ccdf4c8-m4xnv 1/1 Running 0 22s 2600:1f13:859:ec02:545c::7 ip-10-0-93-63.us-west-2.compute.internal <none> <none>
default inflate-866ccdf4c8-mlllj 1/1 Running 0 93s 2600:1f13:859:ec02:545c::2 ip-10-0-93-63.us-west-2.compute.internal <none> <none>
default inflate-866ccdf4c8-nw6dq 1/1 Running 0 22s 2600:1f13:859:ec02:545c::3 ip-10-0-93-63.us-west-2.compute.internal <none> <none>
default inflate-866ccdf4c8-ptp6s 0/1 Pending 0 22s <none> <none> <none> <none>
default inflate-866ccdf4c8-wxk5n 1/1 Running 0 22s 2600:1f13:859:ec02:545c::5 ip-10-0-93-63.us-west-2.compute.internal <none> <none>
kube-system aws-node-4jcdw 1/1 Running 1 (10m ago) 12m 2600:1f13:859:ec02:7fb:b410:fe63:20e2 ip-10-0-93-63.us-west-2.compute.internal <none> <none>
kube-system coredns-657694c6f4-hdz92 1/1 Running 0 23m 2600:1f13:859:ec02:545c::1 ip-10-0-93-63.us-west-2.compute.internal <none> <none>
kube-system coredns-657694c6f4-l9btx 1/1 Running 0 23m 2600:1f13:859:ec02:545c:: ip-10-0-93-63.us-west-2.compute.internal <none> <none>
kube-system kube-proxy-42kht 1/1 Running 0 12m 2600:1f13:859:ec02:7fb:b410:fe63:20e2 ip-10-0-93-63.us-west-2.compute.internal <none> <none>
Node to node communication is working. This is all private and internal. Now, let's try creating a service and putting nodes into a public subnet.
I just realised that EgressOnlyInternetGateway creation needs to be part of the Network creation conditions when IPv6 is enabled. But only if there are private subnets in the picture. I guess it's fine. ¯_(ツ)_/¯ we just pre-create it, no matter what, and we just don't use it if it's not needed. It shouldn't eat any resources.