pulumi-eks icon indicating copy to clipboard operation
pulumi-eks copied to clipboard

Creating EKS cluster failing when providing credentials via pulumi config secrets instead of environment

Open robotlovesyou opened this issue 3 years ago • 24 comments
trafficstars

Hello!

  • Vote on this issue by adding a 👍 reaction
  • To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already)

Issue details

I'm writing a utility to create infrastructure for us via the pulumi automation api. I'm also using the AWS STS SDK to perform the assume role command to acquire AWS. credentials. I have a stack which creates a simple EKS cluster for our CI runners. When providing AWS credentials via aws:accessKey, aws:secretKey, aws:token, the creation of the EKS cluster fails at quite a late stage due to being unable to communicate with the EKS Cluster API. Note that a lot of the related AWS objects, including the AWS cluster itself are successfully created, so for most of the process, the provided credentials are being used.

Because I can successfully update the stack when run manually (via pulumi up) with credentials provided as environment variables, I tried altering my automation code to programatically set environment variables rather than configuration secrets for the stack, and it started completing.

My hunch is that something to do with our configuration causes pulumi eks to use the k8s API as well as the AWS API and that there is an issue with that part of the process which causes the credentials to not be collected.

Steps to reproduce

Running the code below with credentials provided via pulumi config will fail. Providing the same credentials via environment variables will succeed.

/**
 * Creates necessary dependencies and then sets up an eks cluster for running ci jobs
 */

import * as pulumi from "@pulumi/pulumi"
import * as eks from "@pulumi/eks";
import * as awsx from "@pulumi/awsx";
 
 /**
  * Create a VPC with the given name
  * @param name the name of the vpc
  * @returns 
  */
 const createVPC = (name: string):awsx.ec2.Vpc => {
     const vpc = new awsx.ec2.Vpc(name, {});
     return vpc;
 }
 
 
 /**
  * Set the name and vpc to use for an eks cluster
  */
 interface ClusterOptions {
     name: string
     vpc: awsx.ec2.Vpc
 }
 
 /**
  * Create an EKS cluster with the provided options set
  * @param opts options for the eks cluster
  * @returns 
  */
 
 const createCluster = (opts: ClusterOptions): eks.Cluster => new eks.Cluster(opts.name, {
     vpcId: opts.vpc.id,
     publicSubnetIds: opts.vpc.publicSubnetIds,
     privateSubnetIds: opts.vpc.privateSubnetIds,
     nodeAssociatePublicIpAddress: false,
     nodeGroupOptions: {
         desiredCapacity: 3,
         minSize: 2,
         maxSize: 5,
         instanceType: "t3.xlarge",
         nodeRootVolumeSize: 100,
     },
     version: "1.21",
     useDefaultVpcCni: true,
     enabledClusterLogTypes: ["api", "audit", "controllerManager", "scheduler"],
     createOidcProvider: true,
 });

 const stackConfig = new pulumi.Config()
 const awsConfig = new pulumi.Config("aws")

 const baseName = `${stackConfig.require("subaccount")}-${awsConfig.require("region")}-ci-cluster`
 
 const vpc = createVPC(`${baseName}-vpc`);
 const cluster = createCluster({name: `${baseName}-eks`, vpc});
 
 export const eksClusterName = cluster.eksCluster.id;
 export const eksKubeconfig = cluster.kubeconfig;
 export const oidcProviderArn = cluster.core.oidcProvider?.arn
 export const oidcProviderUrl = cluster.core.oidcProvider?.url

Expected: The update succeeds Actual: The update fails

robotlovesyou avatar Mar 01 '22 15:03 robotlovesyou

@robotlovesyou Thanks for submitting this issue. A few questions:

  1. Would it be possible to provide us with the actual error output?
  2. Does using pulumi up with the variables in the stack config work? (This will help us determine whether this is an issue with the automation API or not.)

jkodroff avatar Mar 01 '22 20:03 jkodroff

@jkodroff

Would it be possible to provide us with the actual error output?

I am not sure when I will get a chance to break this again in order to provide you with the error output, especially since I appear to have found at least one new way to place a stack into a state where nothing works except manually deleting everything, which is not good, and has eaten a lot of my time in the last day or so.

Does using pulumi up with the variables in the stack config work? (This will help us determine whether this is an issue with the automation API or not.)

It appeared that both the automation approach and pulumi up were failing but there could be external factors which caused that (such as the temporary credentials having already expired) and I did not control for that before I found the workaround.

robotlovesyou avatar Mar 03 '22 12:03 robotlovesyou

I think I'm seeing the same thing, my errors are:

kubernetes:core/v1:ConfigMap hosted-cp-workshop-nodeAccess creating error: configured Kubernetes cluster is unreachable: unable to load schema information from the API server: the server has asked for the client to provide credentials

and

eks:index:VpcCni hosted-cp-workshop-vpc-cni creating error: Command failed: kubectl apply -f /var/folders/9r/by9bv60j729_wcsxd86fwv480000gn/T/tmp-11470MBwhtB8hf7cn.tmp

Note that we cheat and manually configure an aws profile using the config based credentials so that the token retrieval kubeconfig these things use is able to retrieve it. It does appear that the token retrieval works but the token is unauthorized.

liamawhite avatar Mar 08 '22 06:03 liamawhite

@liamawhite @jkodroff From memory, these are the same as the errors I got.

robotlovesyou avatar Mar 15 '22 12:03 robotlovesyou

I have also had this exact problem. I'm not using the automation api - is there any way to set the environment variables in a normal pulumi run from pulumi up so that I can use the workaround?

saborrie avatar Sep 05 '22 08:09 saborrie

any update? I have the same issue trying to create a eks.Cluster with fargate

juanfbl9307 avatar Nov 28 '22 16:11 juanfbl9307

also seeing this when running from github actions, locally it's working fine

    error: could not get server version from Kubernetes: the server has asked for the client to provide credentials

esomore avatar Jan 15 '23 23:01 esomore

Is there any update on this?

mchristen avatar Jan 31 '23 16:01 mchristen

Same issue is biting me as well.

I'm running into this running pulumi up from the context of a github action, which has two aws profiles; default and cosm-sandbox.

My pulumi.yaml for this stack has aws:profile set to cosm-sandbox, and while the cluster is created, and I've verified that the cosm-sandbox profile can access it, pulumi up fails with the following;

  cosm:eks$eks:index:Cluster$eks:index:VpcCni (sandboxblue-us-west-2-vpc-cni)
    error: Command failed: kubectl apply -f /tmp/tmp-911MsliU5Df1Aln.tmp
error: You must be logged in to the server (the server has asked for the client to provide credentials)

Assuming the role on the command line locally, I can interact with the cluster just fine;

❯ aws --profile cosm-sandbox-pulumi eks update-kubeconfig --name sandboxblue-us-west-2 --profile cosm-sandbox-pulumi
Updated context arn:aws:eks:us-west-2:743399912270:cluster/sandboxblue-us-west-2 in /Users/jloosli/.kube/config
❯ k get ns
NAME              STATUS   AGE
default           Active   4m37s
kube-node-lease   Active   4m39s
kube-public       Active   4m39s
kube-system       Active   4m39s
❯ k get cm -n kube-system
NAME                                 DATA   AGE
coredns                              1      4m51s
cp-vpc-resource-controller           0      4m46s
eks-certificates-controller          0      4m44s
extension-apiserver-authentication   6      4m57s
kube-proxy                           1      4m51s
kube-proxy-config                    1      4m51s
kube-root-ca.crt                     1      4m47s

Notably, the aws-auth configmap is missing.

I've tested this behavior on eks version 0.41.2 and 1.0.1 with similar results.

jamesloosli avatar Feb 14 '23 06:02 jamesloosli

So it seems the resolution was to switch the providerConfigOpts from using the named profile to a roleArn.

    // Create cluster
    this.cluster = new eks.Cluster(
      clusterName,
      {
        name: clusterName,
        vpcId: args.vpcId,
        subnetIds: args.subnetIds,
        skipDefaultNodeGroup: true,
        providerCredentialOpts: {
          // profileName: args.context.callerProfile,
          roleArn: `arn:aws:iam::${args.accountId}:role/pulumi-deploy-role`
        },
        enabledClusterLogTypes: [
          'api',
          'audit',
          'authenticator',
          'controllerManager',
          'scheduler'
        ],
        version: '1.24',
        roleMappings: this.roleMappings,
        userMappings: this.userMappings,
        tags: args.context.tags,
        instanceRoles: [this.nodeRole]
      },
      {
        parent: this
      }
    );

In our case, because the deploy role arn is the same in each of our accounts, this should work. I don't like that named profiles don't work though.

jamesloosli avatar Feb 14 '23 06:02 jamesloosli

I'm facing this issue as I use those AWS env vars to login to the root account and use a bucket on that account as our pulumi backend, all of our stacks refer to an aws:profile and deployment is failing wr/t k8s as a result.

If I run it locally and change the env vars locally it suddenly starts working.

KingNoosh avatar May 26 '23 11:05 KingNoosh

Having removed the AWS Env Vars and using a default profile resolved my problem for me.

KingNoosh avatar May 30 '23 02:05 KingNoosh

Having the same problem: I use environment variables set to the credentials for the account where I have the S3 bucket where I store pulumi states hosted. Then I provide specific AWS credentials for the deployment using AWS specific config variables in the stack configuration file: aws:accessKey, aws:region and aws:secretKey This way I can have one pulumi project provisioned to different accounts based on stack configuration settings.

The provisioning is working well until eks:index:VpcCni has to be provisioned when it is raising following error:

eks:index:VpcCni (cluster-vpc-cni):
    error: Command failed: kubectl apply -f /tmp/tmp-42708k01S4dxhHP5U.tmp
    error: You must be logged in to the server (the server has asked for the client to provide credentials)

amurillo-ncser avatar Jun 19 '23 19:06 amurillo-ncser

Having removed the AWS Env Vars and using a default profile resolved my problem for me.

But then you have to share the same credentials for S3 backend states and deployment itself, true? My idea was to have all my pulumi states files in a common S3 bucket from a "management" account and then provide specific AWS credentials for the deployment within the stack configuration file.

amurillo-ncser avatar Jun 23 '23 15:06 amurillo-ncser

For me it was enough to do this:

eksCluster, err := eks.NewCluster(ctx, clusterName, &eks.ClusterArgs{
	ProviderCredentialOpts: eks.KubeconfigOptionsArgs{
		ProfileName: pulumi.StringPtr(awsProfile),
	},

ghostsquad avatar Sep 27 '23 04:09 ghostsquad

Having removed the AWS Env Vars and using a default profile resolved my problem for me.

But then you have to share the same credentials for S3 backend states and deployment itself, true? My idea was to have all my pulumi states files in a common S3 bucket from a "management" account and then provide specific AWS credentials for the deployment within the stack configuration file.

Apologies for responding so late, we still use aws:profile, it's just that the management bucket uses the default aws profile when logged in still using the host machine's AWS creds.

So with this all of the state is managed in the root account still.

KingNoosh avatar Jan 05 '24 17:01 KingNoosh