pulumi-eks
pulumi-eks copied to clipboard
Creating EKS cluster failing when providing credentials via pulumi config secrets instead of environment
Hello!
- Vote on this issue by adding a 👍 reaction
- To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already)
Issue details
I'm writing a utility to create infrastructure for us via the pulumi automation api. I'm also using the AWS STS SDK to perform the assume role command to acquire AWS. credentials. I have a stack which creates a simple EKS cluster for our CI runners. When providing AWS credentials via aws:accessKey, aws:secretKey, aws:token, the creation of the EKS cluster fails at quite a late stage due to being unable to communicate with the EKS Cluster API. Note that a lot of the related AWS objects, including the AWS cluster itself are successfully created, so for most of the process, the provided credentials are being used.
Because I can successfully update the stack when run manually (via pulumi up) with credentials provided as environment variables, I tried altering my automation code to programatically set environment variables rather than configuration secrets for the stack, and it started completing.
My hunch is that something to do with our configuration causes pulumi eks to use the k8s API as well as the AWS API and that there is an issue with that part of the process which causes the credentials to not be collected.
Steps to reproduce
Running the code below with credentials provided via pulumi config will fail. Providing the same credentials via environment variables will succeed.
/**
* Creates necessary dependencies and then sets up an eks cluster for running ci jobs
*/
import * as pulumi from "@pulumi/pulumi"
import * as eks from "@pulumi/eks";
import * as awsx from "@pulumi/awsx";
/**
* Create a VPC with the given name
* @param name the name of the vpc
* @returns
*/
const createVPC = (name: string):awsx.ec2.Vpc => {
const vpc = new awsx.ec2.Vpc(name, {});
return vpc;
}
/**
* Set the name and vpc to use for an eks cluster
*/
interface ClusterOptions {
name: string
vpc: awsx.ec2.Vpc
}
/**
* Create an EKS cluster with the provided options set
* @param opts options for the eks cluster
* @returns
*/
const createCluster = (opts: ClusterOptions): eks.Cluster => new eks.Cluster(opts.name, {
vpcId: opts.vpc.id,
publicSubnetIds: opts.vpc.publicSubnetIds,
privateSubnetIds: opts.vpc.privateSubnetIds,
nodeAssociatePublicIpAddress: false,
nodeGroupOptions: {
desiredCapacity: 3,
minSize: 2,
maxSize: 5,
instanceType: "t3.xlarge",
nodeRootVolumeSize: 100,
},
version: "1.21",
useDefaultVpcCni: true,
enabledClusterLogTypes: ["api", "audit", "controllerManager", "scheduler"],
createOidcProvider: true,
});
const stackConfig = new pulumi.Config()
const awsConfig = new pulumi.Config("aws")
const baseName = `${stackConfig.require("subaccount")}-${awsConfig.require("region")}-ci-cluster`
const vpc = createVPC(`${baseName}-vpc`);
const cluster = createCluster({name: `${baseName}-eks`, vpc});
export const eksClusterName = cluster.eksCluster.id;
export const eksKubeconfig = cluster.kubeconfig;
export const oidcProviderArn = cluster.core.oidcProvider?.arn
export const oidcProviderUrl = cluster.core.oidcProvider?.url
Expected: The update succeeds Actual: The update fails
@robotlovesyou Thanks for submitting this issue. A few questions:
- Would it be possible to provide us with the actual error output?
- Does using
pulumi upwith the variables in the stack config work? (This will help us determine whether this is an issue with the automation API or not.)
@jkodroff
Would it be possible to provide us with the actual error output?
I am not sure when I will get a chance to break this again in order to provide you with the error output, especially since I appear to have found at least one new way to place a stack into a state where nothing works except manually deleting everything, which is not good, and has eaten a lot of my time in the last day or so.
Does using pulumi up with the variables in the stack config work? (This will help us determine whether this is an issue with the automation API or not.)
It appeared that both the automation approach and pulumi up were failing but there could be external factors which caused that (such as the temporary credentials having already expired) and I did not control for that before I found the workaround.
I think I'm seeing the same thing, my errors are:
kubernetes:core/v1:ConfigMap hosted-cp-workshop-nodeAccess creating error: configured Kubernetes cluster is unreachable: unable to load schema information from the API server: the server has asked for the client to provide credentials
and
eks:index:VpcCni hosted-cp-workshop-vpc-cni creating error: Command failed: kubectl apply -f /var/folders/9r/by9bv60j729_wcsxd86fwv480000gn/T/tmp-11470MBwhtB8hf7cn.tmp
Note that we cheat and manually configure an aws profile using the config based credentials so that the token retrieval kubeconfig these things use is able to retrieve it. It does appear that the token retrieval works but the token is unauthorized.
@liamawhite @jkodroff From memory, these are the same as the errors I got.
I have also had this exact problem. I'm not using the automation api - is there any way to set the environment variables in a normal pulumi run from pulumi up so that I can use the workaround?
any update? I have the same issue trying to create a eks.Cluster with fargate
also seeing this when running from github actions, locally it's working fine
error: could not get server version from Kubernetes: the server has asked for the client to provide credentials
Is there any update on this?
Same issue is biting me as well.
I'm running into this running pulumi up from the context of a github action, which has two aws profiles; default and cosm-sandbox.
My pulumi.yaml for this stack has aws:profile set to cosm-sandbox, and while the cluster is created, and I've verified that the cosm-sandbox profile can access it, pulumi up fails with the following;
cosm:eks$eks:index:Cluster$eks:index:VpcCni (sandboxblue-us-west-2-vpc-cni)
error: Command failed: kubectl apply -f /tmp/tmp-911MsliU5Df1Aln.tmp
error: You must be logged in to the server (the server has asked for the client to provide credentials)
Assuming the role on the command line locally, I can interact with the cluster just fine;
❯ aws --profile cosm-sandbox-pulumi eks update-kubeconfig --name sandboxblue-us-west-2 --profile cosm-sandbox-pulumi
Updated context arn:aws:eks:us-west-2:743399912270:cluster/sandboxblue-us-west-2 in /Users/jloosli/.kube/config
❯ k get ns
NAME STATUS AGE
default Active 4m37s
kube-node-lease Active 4m39s
kube-public Active 4m39s
kube-system Active 4m39s
❯ k get cm -n kube-system
NAME DATA AGE
coredns 1 4m51s
cp-vpc-resource-controller 0 4m46s
eks-certificates-controller 0 4m44s
extension-apiserver-authentication 6 4m57s
kube-proxy 1 4m51s
kube-proxy-config 1 4m51s
kube-root-ca.crt 1 4m47s
Notably, the aws-auth configmap is missing.
I've tested this behavior on eks version 0.41.2 and 1.0.1 with similar results.
So it seems the resolution was to switch the providerConfigOpts from using the named profile to a roleArn.
// Create cluster
this.cluster = new eks.Cluster(
clusterName,
{
name: clusterName,
vpcId: args.vpcId,
subnetIds: args.subnetIds,
skipDefaultNodeGroup: true,
providerCredentialOpts: {
// profileName: args.context.callerProfile,
roleArn: `arn:aws:iam::${args.accountId}:role/pulumi-deploy-role`
},
enabledClusterLogTypes: [
'api',
'audit',
'authenticator',
'controllerManager',
'scheduler'
],
version: '1.24',
roleMappings: this.roleMappings,
userMappings: this.userMappings,
tags: args.context.tags,
instanceRoles: [this.nodeRole]
},
{
parent: this
}
);
In our case, because the deploy role arn is the same in each of our accounts, this should work. I don't like that named profiles don't work though.
I'm facing this issue as I use those AWS env vars to login to the root account and use a bucket on that account as our pulumi backend, all of our stacks refer to an aws:profile and deployment is failing wr/t k8s as a result.
If I run it locally and change the env vars locally it suddenly starts working.
Having removed the AWS Env Vars and using a default profile resolved my problem for me.
Having the same problem:
I use environment variables set to the credentials for the account where I have the S3 bucket where I store pulumi states hosted.
Then I provide specific AWS credentials for the deployment using AWS specific config variables in the stack configuration file:
aws:accessKey, aws:region and aws:secretKey
This way I can have one pulumi project provisioned to different accounts based on stack configuration settings.
The provisioning is working well until eks:index:VpcCni has to be provisioned when it is raising following error:
eks:index:VpcCni (cluster-vpc-cni):
error: Command failed: kubectl apply -f /tmp/tmp-42708k01S4dxhHP5U.tmp
error: You must be logged in to the server (the server has asked for the client to provide credentials)
Having removed the AWS Env Vars and using a
defaultprofile resolved my problem for me.
But then you have to share the same credentials for S3 backend states and deployment itself, true? My idea was to have all my pulumi states files in a common S3 bucket from a "management" account and then provide specific AWS credentials for the deployment within the stack configuration file.
For me it was enough to do this:
eksCluster, err := eks.NewCluster(ctx, clusterName, &eks.ClusterArgs{
ProviderCredentialOpts: eks.KubeconfigOptionsArgs{
ProfileName: pulumi.StringPtr(awsProfile),
},
Having removed the AWS Env Vars and using a
defaultprofile resolved my problem for me.But then you have to share the same credentials for S3 backend states and deployment itself, true? My idea was to have all my pulumi states files in a common S3 bucket from a "management" account and then provide specific AWS credentials for the deployment within the stack configuration file.
Apologies for responding so late, we still use aws:profile, it's just that the management bucket uses the default aws profile when logged in still using the host machine's AWS creds.
So with this all of the state is managed in the root account still.