cluster-api-provider-aws
cluster-api-provider-aws copied to clipboard
Add flavour for using AWS VPC CNI
/kind feature
from my backgroup, I consider to use native network work with k8s, like EKS.
however, cluster-api-provider-aws not support it yet.
can we consider to support amazon-vpc-cni-k8s support ?
or accept a PR to implement this feature ?
@Sn0rt I don't see any issues with adding support for amazon-vpc-cni-k8s.
At a cursory glance I believe it would require:
- Swapping out the Calico manifests in
addons.yamlwith the manifests foramazon-vpc-cni-k8s - Updating the
nodes.cluster-api-provider-aws.sigs.k8s.iopolicy as per the docs - The user to override the kubelet
--max-podsappropriately for each Machine* object they define to avoid overscheduling any individual Node.
/cc @randomvariable cc'ing Naadir in case he has thoughts on how we can potentially scope down the IAM permissions needed vs the broad permissions listed in the docs.
:+1:
We're big VPC CNI users and would be happy to help out on this.
/cc @sethp-nr
Can do either of the following:
- Create a new policy and attach it to the node manually
- Add an option to the CloudFormation generation to do it automatically. Might be better not to modify the existing policies.
FWIW this works today by applying a custom policy to the control plane machines and worker nodes with the Machine's spec.providerSpec.value.iamInstanceProfile. It doesn't look like any of the ENI stuff is scoped to the CAPA tag(s), despite some evidence that we wanted to – @rudoi do you remember if we tried to get the CNI permissions to be scoped to just the CAPA machines?
I finished a POC
1: create a cluster
create the cluster and the control panel
apiVersion: "cluster.k8s.io/v1alpha1"
kind: Cluster
metadata:
name: aws-eni
spec:
clusterNetwork:
services:
cidrBlocks: ["10.96.0.0/12"]
pods:
cidrBlocks: ["192.168.0.0/16"]
serviceDomain: "cluster.local"
providerSpec:
value:
apiVersion: "awsprovider/v1alpha1"
kind: "AWSClusterProviderSpec"
region: "us-east-2"
sshKeyName: "guohao"
2: create machine deployment
apiVersion: "cluster.k8s.io/v1alpha1"
kind: MachineDeployment
metadata:
name: aws-eni-machinedeployment
labels:
cluster.k8s.io/cluster-name: aws-eni
spec:
replicas: 1
selector:
matchLabels:
cluster.k8s.io/cluster-name: aws-eni
set: node
template:
metadata:
labels:
cluster.k8s.io/cluster-name: aws-eni
set: node
spec:
versions:
kubelet: v1.14.4
providerSpec:
value:
apiVersion: awsprovider/v1alpha1
kind: AWSMachineProviderSpec
instanceType: "t2.medium"
iamInstanceProfile: "nodes.cluster-api-provider-aws.sigs.k8s.io"
keyName: "guohao"
3: create the IAM permission
create a policy, there is assign permission to the node.
guohao@buffer ~ $ aws iam get-policy --policy-arn arn:aws:iam::179516646050:policy/amazon-vpc-cni-k8s-IAM
{
"Policy": {
"PolicyName": "amazon-vpc-cni-k8s-IAM",
"PolicyId": "ANPASTTAGUKRHOLMEGMU2",
"Arn": "arn:aws:iam::179516646050:policy/amazon-vpc-cni-k8s-IAM",
"Path": "/",
"DefaultVersionId": "v1",
"AttachmentCount": 1,
"PermissionsBoundaryUsageCount": 0,
"IsAttachable": true,
"Description": "the permission of aws eni",
"CreateDate": "2019-08-09T02:35:54Z",
"UpdateDate": "2019-08-09T02:35:54Z"
}
}
4: attache the permission policy of AWS-ENI-CNI to nodes.cluster-api-provider-aws.sigs.k8s.io role, which is set to work node
guohao@buffer ~ $ aws iam list-attached-role-policies --role-name nodes.cluster-api-provider-aws.sigs.k8s.io
and the output as follows
{
"AttachedPolicies": [
{
"PolicyName": "amazon-vpc-cni-k8s-IAM",
"PolicyArn": "arn:aws:iam::179516646050:policy/amazon-vpc-cni-k8s-IAM"
},
{
"PolicyName": "nodes.cluster-api-provider-aws.sigs.k8s.io",
"PolicyArn": "arn:aws:iam::179516646050:policy/nodes.cluster-api-provider-aws.sigs.k8s.io"
}
]
}
5: check the node of the cluster
get the kubeconfig by the clusterctl.
guohao@buffer ~/workspace $ kubectl --kubeconfig kubeconfig get node
NAME STATUS ROLES AGE VERSION
ip-10-0-0-133.us-east-2.compute.internal NotReady master 18h v1.14.4
ip-10-0-0-172.us-east-2.compute.internal NotReady node 17h v1.14.4
6: apply the aws-eni-ds
guohao@buffer ~/workspace $ kubectl --kubeconfig kubeconfig apply -f https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/master/config/v1.5/aws-k8s-cni.yaml
clusterrole.rbac.authorization.k8s.io/aws-node created
serviceaccount/aws-node created
clusterrolebinding.rbac.authorization.k8s.io/aws-node created
daemonset.apps/aws-node created
customresourcedefinition.apiextensions.k8s.io/eniconfigs.crd.k8s.amazonaws.com created
7: check the pod status
you can found the pod is createing status too long.
kube-system coredns-584795fc57-lnn5h 0/1 ContainerCreating 0 20h <none> ip-10-0-0-133.us-east-2.compute.internal <none> <none>
kube-system coredns-584795fc57-nmcsj 0/1 ContainerCreating 0 20h <none> ip-10-0-0-133.us-east-2.compute.internal <none> <none>
delete it, and k8s will rebuild the pod.
kube-system coredns-584795fc57-ztmbx 1/1 Running 0 22m 10.0.0.237 ip-10-0-0-172.us-east-2.compute.internal <none> <none>
8: the ip pool which is assigned to ec2 instance
get the status of the instance
guohao@buffer ~ $ aws ec2 describe-instances --instance-id i-053b8794d7f90a110
{
"Reservations": [
{
....
"NetworkInterfaces": [
{
"Attachment": {
"AttachTime": "2019-08-08T09:06:33.000Z",
"AttachmentId": "eni-attach-09285aff116268f94",
"DeleteOnTermination": true,
"DeviceIndex": 0,
"Status": "attached"
},
"Description": "",
"Groups": [
{
"GroupName": "aws-eni-lb",
"GroupId": "sg-021eaefb3018d0551"
},
{
"GroupName": "aws-eni-node",
"GroupId": "sg-04cfe4c2052f87031"
}
],
"Ipv6Addresses": [],
"MacAddress": "02:8e:50:b0:02:8a",
"NetworkInterfaceId": "eni-03b66efcc616b8c86",
"OwnerId": "179516646050",
"PrivateDnsName": "ip-10-0-0-172.us-east-2.compute.internal",
"PrivateIpAddress": "10.0.0.172",
"PrivateIpAddresses": [
{
"Primary": true,
"PrivateDnsName": "ip-10-0-0-172.us-east-2.compute.internal",
"PrivateIpAddress": "10.0.0.172"
},
{
"Primary": false,
"PrivateDnsName": "ip-10-0-0-232.us-east-2.compute.internal",
"PrivateIpAddress": "10.0.0.232"
},
{
"Primary": false,
"PrivateDnsName": "ip-10-0-0-170.us-east-2.compute.internal",
"PrivateIpAddress": "10.0.0.170"
},
{
"Primary": false,
"PrivateDnsName": "ip-10-0-0-237.us-east-2.compute.internal",
"PrivateIpAddress": "10.0.0.237"
},
{
"Primary": false,
"PrivateDnsName": "ip-10-0-0-205.us-east-2.compute.internal",
"PrivateIpAddress": "10.0.0.205"
},
{
"Primary": false,
"PrivateDnsName": "ip-10-0-0-222.us-east-2.compute.internal",
"PrivateIpAddress": "10.0.0.222"
}
],
"SourceDestCheck": true,
"Status": "in-use",
"SubnetId": "subnet-0892c669597c0a9aa",
"VpcId": "vpc-0eadd8ecf99f5b4c6",
"InterfaceType": "interface"
},
{
"Attachment": {
"AttachTime": "2019-08-09T05:17:24.000Z",
"AttachmentId": "eni-attach-0591fb5b94cb67eb8",
"DeleteOnTermination": true,
"DeviceIndex": 1,
"Status": "attached"
},
"Description": "aws-K8S-i-053b8794d7f90a110",
"Groups": [
{
"GroupName": "aws-eni-lb",
"GroupId": "sg-021eaefb3018d0551"
},
{
"GroupName": "aws-eni-node",
"GroupId": "sg-04cfe4c2052f87031"
}
],
"Ipv6Addresses": [],
"MacAddress": "02:58:f9:8c:b5:3c",
"NetworkInterfaceId": "eni-0da443a1cf644f334",
"OwnerId": "179516646050",
"PrivateDnsName": "ip-10-0-0-56.us-east-2.compute.internal",
"PrivateIpAddress": "10.0.0.56",
"PrivateIpAddresses": [
{
"Primary": true,
"PrivateDnsName": "ip-10-0-0-56.us-east-2.compute.internal",
"PrivateIpAddress": "10.0.0.56"
},
{
"Primary": false,
"PrivateDnsName": "ip-10-0-0-183.us-east-2.compute.internal",
"PrivateIpAddress": "10.0.0.183"
},
{
"Primary": false,
"PrivateDnsName": "ip-10-0-0-74.us-east-2.compute.internal",
"PrivateIpAddress": "10.0.0.74"
},
{
"Primary": false,
"PrivateDnsName": "ip-10-0-0-91.us-east-2.compute.internal",
"PrivateIpAddress": "10.0.0.91"
},
{
"Primary": false,
"PrivateDnsName": "ip-10-0-0-235.us-east-2.compute.internal",
"PrivateIpAddress": "10.0.0.235"
},
{
"Primary": false,
"PrivateDnsName": "ip-10-0-0-236.us-east-2.compute.internal",
"PrivateIpAddress": "10.0.0.236"
}
],
"SourceDestCheck": true,
"Status": "in-use",
"SubnetId": "subnet-0892c669597c0a9aa",
"VpcId": "vpc-0eadd8ecf99f5b4c6",
"InterfaceType": "interface"
}
],
"RootDeviceName": "/dev/sda1",
"RootDeviceType": "ebs",
"SecurityGroups": [
{
"GroupName": "aws-eni-lb",
"GroupId": "sg-021eaefb3018d0551"
},
{
"GroupName": "aws-eni-node",
"GroupId": "sg-04cfe4c2052f87031"
}
],
"SourceDestCheck": true,
"Tags": [
{
"Key": "sigs.k8s.io/cluster-api-provider-aws/role",
"Value": "node"
},
{
"Key": "sigs.k8s.io/cluster-api-provider-aws/cluster/aws-eni",
"Value": "owned"
},
{
"Key": "Name",
"Value": "aws-eni-machinedeployment-5745b4948d-tg55f"
},
{
"Key": "kubernetes.io/cluster/aws-eni",
"Value": "owned"
}
],
...
]
}
and check the eni ds status
guohao@buffer ~/workspace $ kubectl --kubeconfig kubeconfig logs aws-node-lc7ph -n kube-system
====== Starting amazon-k8s-agent ======
Checking if ipamd is serving
Waiting for ipamd health check
Ipamd is up and serving
Copying AWS CNI plugin and config
Node ready, watching ipamd health
FWIW this works today by applying a custom policy to the control plane machines and worker nodes with the Machine's
spec.providerSpec.value.iamInstanceProfile. It doesn't look like any of the ENI stuff is scoped to the CAPA tag(s), despite some evidence that we wanted to – @rudoi do you remember if we tried to get the CNI permissions to be scoped to just the CAPA machines?
hi, are you still work this feature?
/assign
Folks, just as a reminder, use /lifecycle active if you're actively working on something 😃
@Sn0rt It's working for us as-is, so we haven't touched it in quite a while. Feel free to pick this ticket up!
/lifecycle active
@sethp-nr
We should consider a cluster-level flag to indicate the current cluster's CNI solution.
The max-pod parameter of amazon-vpc-cni-k8s depends on the type of instance which can be found here.
such as I consider set a cluster-level Annotation as follow.
or labels?
from my experience, the annotation to configure and labels to select some elements.
apiVersion: "cluster.k8s.io/v1alpha1"
kind: Cluster
metadata:
name: test1
annotation:
cluster.k8s.io/network-cni: amazon-vpc-cni-k8s // support amazon-vpc-cni-k8s, calico
spec:
clusterNetwork:
services:
cidrBlocks: ["10.96.0.0/12"]
pods:
cidrBlocks: ["192.168.0.0/16"]
serviceDomain: "cluster.local"
providerSpec:
value:
apiVersion: "awsprovider/v1alpha1"
kind: "AWSClusterProviderSpec"
region: "us-east-2"
sshKeyName: "guohao"
then the CAPA controller can set the parameter of kubelet by this cluster-level label.
what do you think?
/milestone v0.5.0
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
/lifecycle frozen
Given https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/1747 allows customisation of CNI rules, and clusterawsadm now allows customisation of policies, it should be easier to add a template flavour that uses the AWS VPC CNI.
/help
@randomvariable: This request has been marked as needing help from a contributor.
Please ensure the request meets the requirements listed here.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.
In response to this:
Given https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/1747 allows customisation of CNI rules, and clusterawsadm now allows customisation of policies, it should be easier to add a template flavour that uses the AWS VPC CNI.
/help
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
I think we can close this now as VPC CNI is automtically installed in EKS and is available as a EKS addon to set the specific version.
This issue is not only related to EKS side, so reopening it to track adding a template for AWS native CNI with unmanaged clusters.
/remove-lifecycle frozen
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
/triage accepted
This is ultimately a documentation that should define how to install calico using clusterResourceSet or AddonProviders ( which will eventually deprecate ClusterResourceSet ).
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closed
You can:
- Mark this PR as fresh with
/remove-lifecycle stale - Close this PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale