eks: fail to create eks nodegroup in cn-north-1
Describe the bug
Hi, folks
I met a promble when use aws python cdk to create eks cluster. Please find information below:
My local env: (.venv) [ec2-user@ip-10-0-1-73 python-cdk]$ cdk --version 2.67.0 (build b6f7f39) (.venv) [ec2-user@ip-10-0-1-73 python-cdk]$ python3 --version Python 3.7.10 (.venv) [ec2-user@ip-10-0-1-73 python-cdk]$ cat /proc/version Linux version 5.10.144-127.601.amzn2.x86_64 (mockbuild@ip-10-0-44-229) (gcc10-gcc (GCC) 10.3.1 20210422 (Red Hat 10.3.1-1), GNU ld version 2.35-21.amzn2.0.1) #1 SMP Thu Sep 29 01:11:59 UTC 2022
Here is the core code:
node_role = iam.Role.from_role_arn(self, 'eks-node-role-arn-lookup', 'arn:aws-cn:iam::xxxxxxxxxxx:role/eks-node-role')
cluster.add_nodegroup_capacity(
nodegroup_name,
nodegroup_name=nodegroup_name,
instance_types=[ec2.InstanceType(instance_type)],
min_size=1,
max_size=3,
capacity_type=capacity_type,
disk_size=disk_size,
ami_type=ami_type
node_role=node_role
)
I manually create the Node Role, and the cdk will deploy successfully, but when i remove the node_role parameter, like these:
cluster.add_nodegroup_capacity(
nodegroup_name,
nodegroup_name=nodegroup_name,
instance_types=[ec2.InstanceType(instance_type)],
min_size=1,
max_size=2,
capacity_type=capacity_type,
disk_size=disk_size,
ami_type=ami_type
)
Below error messages will be thrown :
Resource handler returned message: "Following required service principals [ec2.amazonaws.com.cn] were not found in the trust relations
hips of nodeRole arn:aws-cn:iam::4123xxxxxxx:role/eks-cluster-stack-eksgitlabrunnerclusterNodegroupg-1EPH8PW36YZ3A (Service: Eks, Sta
tus Code: 400, Request ID: 6f4cc1b1-4fd2-4072-887c-abc6ddf60d58)" (RequestToken: 7c7be61d-a2a5-3e36-1a34-e6a54c71d72a, HandlerErrorCod
e: InvalidRequest)
But i think the principals [ec2.amazonaws.com.cn] is right in cn-north-1 region.
Could you please help to check this problem ?
Expected Behavior
When I do not specify the node role in the method, i think cdk will automaticallycreate the node role.
Method doc : https://docs.aws.amazon.com/cdk/api/v1/python/aws_cdk.aws_eks/Cluster.html#aws_cdk.aws_eks.Cluster.add_nodegroup_capacity
Current Behavior
In the cn-north-1 region, CDk create the node role failed.
I had checked the principals which in my another ec2 role, the configuration [ec2.amazonaws.com.cn] is right.
It seems that CDK cannot recognize this principals
Reproduction Steps
Refer to the CDK code, when remove the node_role, it will create failed in cn-north-1 region.
Possible Solution
manually create the node role, and hard-code in the cdk code
Additional Information/Context
No response
CDK CLI Version
2.67.0
Framework Version
No response
Node.js Version
v16.18.0
OS
Amazon Linux2
Language
Python
Language Version
3.7.10
Other information
No response
Hi,
Let me clarify this first.
- Does it happen only when you update your EKS deployment by removing your custom nodeRole?
- Are you having this error in
cn-north-1
I found the root cause here:
https://github.com/aws/aws-cdk/blob/3b7431b6ac27f8557c22a8959ae1ce431f6d2167/packages/%40aws-cdk/aws-eks/lib/managed-nodegroup.ts#L380
In China, this should be ec2.amazonaws.com.cn instead.
OK I guess https://github.com/aws/aws-cdk/pull/22589 broke this.
This has been removed in https://github.com/aws/aws-cdk/pull/22589 but actually required for AWS China region.
@pahud Hi, Pahud Many thanks for ur troubleshoot. I also met the similar promble when create the eks cluster by cdk.
I just create the eks cluster, not create the nodegroup and nodegroup role, this is my cdk code :
class EksClusterStack(Stack):
def __init__(self, scope: Construct, identifier, **kwargs):
super().__init__(scope, identifier, **kwargs)
vpc = ec2.Vpc.from_lookup(
self, "my-vpc", vpc_id=vpc_id
)
# eks cluster
cluster = self.create_eks_cluster(vpc)
"""
CfnOutput(self, "eks-cluster-arn-export", value=cluster.cluster_name, export_name="eks-cluster-name")
"""
def create_eks_cluster(self, vpc):
cluster = eks.Cluster(
self,
"eks-gitlab-runner-cluster",
cluster_name=cluster_name,
vpc=vpc,
version=eks.KubernetesVersion.V1_24,
default_capacity=0,
)
return cluster
I deploy the stack in cn-north-1, but the stack roll back finally.
And I check the cfn stack error, cfn stack prompted a sub-stack creation failure, so I checked the error message from the sub-stack and find the following log: Policy arn:aws-cn:iam::aws:policy/AmazonElasticContainerRegistryPublicReadOnly does not exist or is not attachable. (Service: AmazonIdentityManagement; Status Code: 404; Error Code: NoSuchEntity; Request ID: 99a1c2a5-992a-4e47-a0d0-357d9c73c70d; Proxy: null)
I am sure the policy 'AmazonElasticContainerRegistryPublicReadOnly' is aws managed policy which only use for global region, I cannt find this iam policy in China region.
Could you please help to check if it is the same root cause ?
Yes I can't deploy even this to cn-north-1
import { App, Stack, StackProps,
aws_eks as eks,
aws_ec2 as ec2,
} from 'aws-cdk-lib';
import { KubectlV25Layer as KubectlLayer } from '@aws-cdk/lambda-layer-kubectl-v25';
const vpc = ec2.Vpc.fromLookup(this, 'Vpc', { isDefault: true });
const cluster = new eks.Cluster(this, 'Cluster', {
vpc,
version: eks.KubernetesVersion.V1_25,
kubectlLayer: new KubectlLayer(this, 'KubectlLayer'),
})
The error message is just as you described above:
Resource handler returned message: "Following required service principals [ec2.amazonaws.com.cn] were not found in the trust relations
hips of nodeRole arn:aws-cn:iam::4123xxxxxxx:role/eks-cluster-stack-eksgitlabrunnerclusterNodegroupg-1EPH8PW36YZ3A (Service: Eks, Sta
tus Code: 400, Request ID: 6f4cc1b1-4fd2-4072-887c-abc6ddf60d58)" (RequestToken: 7c7be61d-a2a5-3e36-1a34-e6a54c71d72a, HandlerErrorCod
e: InvalidRequest)
Looks like the EKS is expecting ec2 service principal name as ec2.amazonaws.com.cn but CDK is giving ec2.amazonaws.com. I am still working on this to get it sorted with internal teams.
@Bruce-Lu674 I created https://github.com/aws/aws-cdk/issues/24743 for the missing AmazonElasticContainerRegistryPublicReadOnly bug FYR.
@pahud Many thanks for your help, btw, Is there an expected resolution time for this issue? I can use the old version(2.65) to create the EKS Cluster.But I dont think use the old version is long-term solution.
@Bruce-Lu674 The relevant team is working on it but I don't have ETA at this moment but I will update here when I see the issue is fixed(hopefully very soon).
btw, are you able to successfully deploy eks with cdk 2.65 in cn-north-1?
@pahud Yes, I can deploy the EKS Cluster via cdk v2.65 in cn-north-1.
Hi @Bruce-Lu674
Are you able to deploy the cluster AND a nodegroup with cdk v2.65.0 in cn-north-1 like this?
const cluster = new eks.Cluster(this, 'Cluster', {
vpc,
version: eks.KubernetesVersion.V1_24,
defaultCapacity: 0,
kubectlLayer,
});
const ng = cluster.addNodegroupCapacity('NG', {
desiredSize: 2,
});
Hi Pahud @pahud , yes, I can create the eks cluster via v2.65 and v2.66, but without the Nodegroup resource. I think like this:
const cluster = new eks.Cluster(this, 'Cluster', {
vpc,
version: eks.KubernetesVersion.V1_24,
defaultCapacity: 0,
kubectlLayer,
});
Here is my python code:
vpc = ec2.Vpc.from_lookup(
self, "my-vpc", vpc_id=vpc_id
)
# eks cluster
cluster = self.create_eks_cluster(vpc)
def create_eks_cluster(self, vpc):
cluster = eks.Cluster(
self,
"eks-cluster",
cluster_name=cluster_name,
vpc=vpc,
default_capacity=0,
version=eks.KubernetesVersion.V1_24
)
return cluster
@Bruce-Lu674
Unfortunately I can't even successfully deploy the cluster. I'll keep diving deep for the root cause.
btw, do you have account on cdk.dev slack? Can you ping me on the slack so we can directly discuss more details?
Hi
I am on CDK version 2.74.0, and this is still an issue. Any updates / ETA on a fix?
Thanks
@ItielOlenick
Looks like the EKS is expecting ec2 service principal name as ec2.amazonaws.com.cn but CDK is giving ec2.amazonaws.com. I am still working on this to get it sorted with internal teams.
We are still working with internal teams to fix this but unfortunately no ETA at this moment. I'll share the update if any.
EKS in CN is having 2 additional issues as well and we probably need to fix them before we are allowed to deploy with the latest CDK.
- https://github.com/aws/aws-cdk/pull/25215
- https://github.com/aws/aws-cdk/issues/24358
I can confirm we can successfully deploy EKS cluster in China regions with escape hatches as below:
import { KubectlV26Layer as KubectlLayer } from '@aws-cdk/lambda-layer-kubectl-v26';
const cluster = new eks.Cluster(scope, 'EksCluster', {
vpc,
version: eks.KubernetesVersion.V1_26,
kubectlLayer: new KubectlLayer(scope, 'KubectlLayer'),
defaultCapacity: 2,
});
// override the service principal for the default nodegroup
overrideServicePrincipal(cluster.defaultNodegroup?.role.node.defaultChild as iam.CfnRole)
const ng = cluster.addNodegroupCapacity('NG', {
desiredSize: 2,
});
// override the service principal for the additional nodegroup
overrideServicePrincipal(ng.role.node.defaultChild as iam.CfnRole)
function overrideServicePrincipal(role: iam.CfnRole) {
role.addPropertyOverride('AssumeRolePolicyDocument.Statement.0.Principal.Service', ['ec2.amazonaws.com', 'ec2.amazonaws.com.cn'])
}
% kubectl get no
NAME STATUS ROLES AGE VERSION
ip-10-0-140-206.cn-north-1.compute.internal Ready <none> 2m34s v1.26.2-eks-a59e1f0
ip-10-0-141-57.cn-north-1.compute.internal Ready <none> 2m20s v1.26.2-eks-a59e1f0
ip-10-0-174-210.cn-north-1.compute.internal Ready <none> 2m34s v1.26.2-eks-a59e1f0
This is a temporary fix for this issue from CDK.
Hello @pahud ,
We are still encountering below when using latest cdk version to create eks and corresponding resources like helm chart etc, and I tested cdk-2.65.0 which looks good, however, it's hard for us to use this cdk version considering other facts, so do we have a ETA or workaround for this issue?
2023-05-19 14:12:02 UTC+0800 HandlerServiceRoleFCDC14AE CREATE_FAILED Policy arn:aws-cn:iam::aws:policy/AmazonElasticContainerRegistryPublicReadOnly does not exist or is not attachable. (Service: AmazonIdentityManagement; Status Code: 404; Error Code: NoSuchEntity; Request ID: 8a2723e1-3330-40e4-af9c-d45b6e6aa3b3; Proxy: null)
@justin007755 This bug should have been fixed in https://github.com/aws/aws-cdk/pull/25215
Please install the latest AWS CDK and let me know if it works for you.
I am able to deploy this to cn-north-1 with the nodegroup up and running. Hence resolving this issue.
import * as cdk from 'aws-cdk-lib/core';
import * as eks from 'aws-cdk-lib/aws-eks';
import * as ec2 from 'aws-cdk-lib/aws-ec2';
import { Construct } from 'constructs';
import { KubectlV31Layer } from '@aws-cdk/lambda-layer-kubectl-v31';
export class BjsEksStack extends cdk.Stack {
constructor(scope: Construct, id: string, props?: cdk.StackProps) {
super(scope, id, props);
// Create VPC for EKS cluster
const vpc = new ec2.Vpc(this, 'EksVpc', {
maxAzs: 2, // Minimal setup with 2 AZs
natGateways: 1, // Cost-effective single NAT gateway
});
// Create EKS cluster with minimal configuration
const cluster = new eks.Cluster(this, 'EksCluster', {
version: eks.KubernetesVersion.V1_31,
vpc,
defaultCapacity: 0, // We'll add managed node group separately
endpointAccess: eks.EndpointAccess.PUBLIC_AND_PRIVATE,
kubectlLayer: new KubectlV31Layer(this, 'kubectl'),
});
// Add managed node group
cluster.addNodegroupCapacity('DefaultNodeGroup', {
instanceTypes: [new ec2.InstanceType('t3.medium')],
minSize: 1,
maxSize: 3,
desiredSize: 1,
diskSize: 20, // GB
amiType: eks.NodegroupAmiType.AL2_X86_64,
});
// Output cluster endpoint
new cdk.CfnOutput(this, 'ClusterEndpoint', {
value: cluster.clusterEndpoint,
description: 'EKS Cluster Endpoint',
});
// Output cluster name
new cdk.CfnOutput(this, 'ClusterName', {
value: cluster.clusterName,
description: 'EKS Cluster Name',
});
}
}
Comments on closed issues and PRs are hard for our team to see. If you need help, please open a new issue that references this one.
confirmed aws-eks-v2-alpha deploys in cn-north-1 as well.