(aws-eks): `aws-auth` ConfigMap is still being replaced
What is the problem?
#7981 was not fixed by #8447 – The aws-auth ConfigMap is still being replaced, rather than appended to, any time the Cluster.awsAuth getter is invoked, which happens whenever you add a Fargate Profile to a cluster, forcing all Fargate Profiles to originate from the same instance of Cluster from @aws-cdk/aws-eks/cluster.ts.
In the AwsAuth constructor, the following manifest is added added to the cluster:
new KubernetesResource(this, 'manifest', {
cluster: props.cluster,
manifest: [
{
apiVersion: 'v1',
kind: 'ConfigMap',
metadata: {
name: 'aws-auth',
namespace: 'kube-system',
},
data: {
mapRoles: this.synthesizeMapRoles(),
mapUsers: this.synthesizeMapUsers(),
mapAccounts: this.synthesizeMapAccounts(),
},
},
],
});
This constructor is invoked whenever the Cluster.awsAuth getter is invoked, which happens any time you invoke Cluster.awsAuth.addRoleMapping, which happens any time you invoke Cluster.addFargateProfile.
What this means is that if any software changes the mapRoles of the aws-auth ConfigMap without invoking Cluster.awsAuth.addRoleMapping on that particular instance of Cluster, those changes will be erased upon the next addition of a Fargate Profile.
In other words, upon each instantiation of Cluster where the Cluster.awsAuth getter is invoked, all ARNs of all PodExecutionRoles added manually with kubectl, other CDK apps, other IaC tools, an L1 Construct like CfnFargateProfile, etc. will be removed from the aws-auth ConfigMap, causing this error below, and preventing any new pods from spinning up.
fargate-scheduler Misconfigured Fargate Profile: fargate profile <PROFILE_NAME> blocked for new launches due to: Pod execution role is not found in auth config or does not have all required permissions for launching fargate pods.
Reproduction Steps
See #7981
What did you expect to happen?
See #7981
What actually happened?
See #7981
CDK CLI Version
1.85.0
Framework Version
1.85.0
Node.js Version
v14.17.1
OS
macOS
Language
Typescript
Language Version
TypeScript (4.0.3)
Other information
EKS version
1.19
Fix suggestion
Instead of applying a manifest to the cluster, an AwsAuth instance should create the aws-auth manifest if and only if the ConfigMap doesn't already exist in the cluster.
When aws-auth already exists, the result should be a kubectl patch of the existing aws-auth ConfigMap (pictured below):
Hi, @IsaacLeeWebDev. Thanks for bringing this up. Indeed, if we want to preserve changes made outside the CDK app, the custom resource should issue a patch command instead of apply.
That said, I'm not sure this is the correct thing to do. The principle behind a CDK app (and CloudFormation, for that matter) is that it represents the full state of your infrastructure. Any out-of-band change should be disregarded in the next deploy of the app.
Leaving it up for discussion for the moment.
@otaviomacedo For what it's worth, for security and governance reasons, it is not viable for us to have a single tool or a single git repo that describes the desired state of all of our infrastructure.
It may also be worth noting that this bug potentially impacts those that use the approach recommended in #13153
As a workaround, we're currently pushing up the role ARN of the PodExecutionRole defined in another CDK app as a CfnOutput, fetching that output where the cluster is defined, and then explicitly calling Cluster.awsAuth.addRoleMapping like so:
myCluster.awsAuth.addRoleMapping(
iam.Role.fromRoleArn(this, `${somethingUnique}-arn-ref`, roleArnFromCfnOutput),
{
username: 'system:node:{{SessionName}}',
groups: ['system:bootstrappers', 'system:nodes', 'system:node-proxier'],
},
);
This is less than ideal because the CDK apps are still tightly coupled to one another... albeit, less so.
Chiming in that we are also facing the same issue following the recommended approach in #13153
I agree with @IsaacLeeWebDev that it's not viable to manage all of our desired infrastructure state in the one place. It would be ideal to manage our Fargate Cluster in one CDK app and to create a Fargate Profile and PodExecutionRole in a separate CDK app per microservice.
While I agree with the principle of CDK (and CloudFormation) it is possible to apply resources to existing clusters in other CDK apps, does this use case not violate those same principles or is there a difference here I'm missing?
I would add that this entire issue caught me way off guard; in the documentation it wasn't clear to me the profiles would suffer different limitations to adding any other resources into the cluster. While this issue remains perhaps a note added to the EKS construct documentation would help make this behaviour clear?
Thanks @IsaacLeeWebDev for the overview on the issue and your workaround. I'll likely be looking at applying a similar workaround until this is resolved.
Perhaps an alternative approach that may be more in line with the principles above is to make changes to the AWS IAM Authenticator.
If patching the aws-auth ConfigMap from various sources is unacceptable perhaps many aws-auth ConfigMaps can be added with different names and discovered via appropriate selectors/annotations. Then each FargateProfile creates its own ConfigMap for the mappings.
Keen to hear thoughts.
Hi @otaviomacedo
Has there been any updates or thoughts on this issue?
As noted here and in the related issues, the problem is that the aws-auth config map is a global singleton shared resource per cluster. This makes it different from regular workload manifests, which are normally scoped to a specific microservice. This is also why ICluster doesn't have the .awsAuth property, nor does it allow defining a fargate profile (which would require the config map).
In general, the CDK is not designed to support managing resources from both within the CDK app and external to it (for example kubectl). This scenario is different though because is presents a conflict between two CDK constructs, even within the same app. One being the Cluster resource, and one being the CfnFargateProfile construct.
@jordangullen your suggestion around the AWS IAM authenticator seems interesting and sounds like the right path forward.
I'm going to mark this as a feature request because technically there is no unexpected behavior here. We are keeping track of it but currently we have no concrete plans to address it.
We use +1s on this issue to help prioritize our work, and are happy to re-evaluate the prioritization of this issue based on community feedback. You can reach out to the cdk.dev community on Slack to solicit support for reprioritization.
If anyone is looking to submit a PR here, please contact us before starting any work because there are few important implementation details needed to be discussed.