terraform-aws-eks
terraform-aws-eks copied to clipboard
dial tcp 127.0.0.1:80: connect: connection refused
Description
I know there are numerous issues (#817) related to this problem, but since v18.20.1 reintroduced the management of configmap thought we could discuss in a new one because the old ones are closed.
The behavior is till very weird. I updated my module to use the configmap management feature and the first run went fine (was using the aws_eks_cluster_auth
datasource. When I run the module with no change I have no error either in plan
or apply
.
I then tried to update my cluster form v1.21 to v1.22 and then plan and apply began to fail
with the following well know error:
null_resource.node_groups_asg_tags["m5a-xlarge-b-priv"]: Refreshing state... [id=7353592322772826167]
β·
β Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp 127.0.0.1:80: connect: connection refused
β
β with kubernetes_config_map_v1_data.aws_auth[0],
β on main.tf line 428, in resource "kubernetes_config_map_v1_data" "aws_auth":
β 428: resource "kubernetes_config_map_v1_data" "aws_auth" {
β
β΅
I then moved to the exec plugin as recommended per the documentation and removed from state the old datasource. Still go the same error.
Something I don't get is when setting the variable export KUBE_CONFIG_PATH=$PWD/kubeconfig
as suggested in #817 things work as expected.
I'm sad to see things are still unusable (not related to this module but on the Kubernetes provider side), load_config_file
option has been removed from Kubernetes provider for a while and I don't see why this variable needs to be set and how it could be set beforehand.
Anyway, if someone managed to use the readded feature of managing configmap I'd be glad to know how to workaround this and help debug this issue.
PS: I'm using Terragrunt, not sure if the issue could be related but it might
- [X] β I have searched the open/closed issues and my issue is not listed.
Versions
- Module version [Required]:
Terraform v1.1.7
on linux_amd64
+ provider registry.terraform.io/hashicorp/aws v4.9.0
+ provider registry.terraform.io/hashicorp/cloudinit v2.2.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.10.0
+ provider registry.terraform.io/hashicorp/null v3.1.1
+ provider registry.terraform.io/hashicorp/tls v3.3.0
Reproduce
Here is my provider block
provider "kubernetes" {
host = data.aws_eks_cluster.cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
exec {
api_version = "client.authentication.k8s.io/v1alpha1"
command = "aws"
args = ["eks", "get-token", "--cluster-name", data.aws_eks_cluster.cluster.id]
}
}
data "aws_eks_cluster" "cluster" {
name = aws_eks_cluster.this[0].id
}
I have the same issue but when I work with state with another AWS user , I'm got error like
Error: Unauthorized
with module.eks.module.eks.kubernetes_config_map.aws_auth[0],
on .terraform/modules/eks.eks/main.tf line 411, in resource "kubernetes_config_map" "aws_auth":
411: resource "kubernetes_config_map" "aws_auth" {
Would you try replacing aws_eks_cluster.this[0].id
with the hard coded cluster name?
I guess aws_eks_cluster.this[0].id
would be known after apply
because you're going to bump up EKS cluster version. That's why the data resource is indeterminate, and kubernetes provider will fallback to default 127.0.0.1:80
.
Would you try replacing
aws_eks_cluster.this[0].id
with the hard coded cluster name?I guess
aws_eks_cluster.this[0].id
would beknown after apply
because you're going to bump up EKS cluster version. That's why the data resource is indeterminate, and kubernetes provider will fallback to default127.0.0.1:80
.
not quite true - if the data source fails to find a result, its a failure not indeterminate.
@ArchiFleKs you shouldn't need the data source at all; does this still present the same issue?
provider "kubernetes" {
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
exec {
api_version = "client.authentication.k8s.io/v1alpha1"
command = "aws"
# This requires the awscli to be installed locally where Terraform is executed
args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
}
}
Would you try replacing
aws_eks_cluster.this[0].id
with the hard coded cluster name? I guessaws_eks_cluster.this[0].id
would beknown after apply
because you're going to bump up EKS cluster version. That's why the data resource is indeterminate, and kubernetes provider will fallback to default127.0.0.1:80
.not quite true - if the data source fails to find a result, its a failure not indeterminate.
@ArchiFleKs you shouldn't need the data source at all; does this still present the same issue?
provider "kubernetes" { host = module.eks.cluster_endpoint cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data) exec { api_version = "client.authentication.k8s.io/v1alpha1" command = "aws" # This requires the awscli to be installed locally where Terraform is executed args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id] } }
you cant run these in tf cloud though, cause of the local exec
Would you try replacing
aws_eks_cluster.this[0].id
with the hard coded cluster name? I guessaws_eks_cluster.this[0].id
would beknown after apply
because you're going to bump up EKS cluster version. That's why the data resource is indeterminate, and kubernetes provider will fallback to default127.0.0.1:80
.not quite true - if the data source fails to find a result, its a failure not indeterminate. @ArchiFleKs you shouldn't need the data source at all; does this still present the same issue?
provider "kubernetes" { host = module.eks.cluster_endpoint cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data) exec { api_version = "client.authentication.k8s.io/v1alpha1" command = "aws" # This requires the awscli to be installed locally where Terraform is executed args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id] } }
you cant run these in tf cloud though, cause of the local exec
This is just merely pointing to what the Kubernetes provider documentation specifies. The module doesn't have any influence over this aspect
I can confirm that this snippet works as expected without the datasource:
provider "kubernetes" {
host = aws_eks_cluster.this[0].endpoint
cluster_ca_certificate = base64decode(aws_eks_cluster.this[0].certificate_authority.0.data)
exec {
api_version = "client.authentication.k8s.io/v1alpha1"
command = "aws"
args = ["eks", "get-token", "--cluster-name", aws_eks_cluster.this[0].id]
}
}
I know Hashi are hiring and have made some hires to start offering more support to the Kubernetes and Helm providers recently so hopefully some of these quirks get resolved soon! for now, we can just keep sharing what others have found to have worked for their setups π€·π½ββοΈ
Unfortunately, it doesn't seem to work with tf-cloud (it gets the Error: failed to create kubernetes rest client for read of resource: Get "http://localhost/api?timeout=32s": dial tcp 127.0.0.1:80: connect: connection refused
error), I locked the module on v18.19 so it still works.
Apparently using kubectl provider instead of kubernetes provider (even completely removing it) made it work with terraform-cloud π€·ββοΈ :
provider "kubectl" {
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
token = data.aws_eks_cluster_auth.cluster.token
exec {
api_version = "client.authentication.k8s.io/v1alpha1"
command = "aws"
args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
}
}
but unfortunately this got the previously working aws-auth deleted and was not able to create one Error: The configmap "aws-auth" does not exist
... :|
I just ran into this while debugging an issue during redeployment of a cluster. I'm not sure exactly how it happened, but we ended up in a state where the cluster had been destroyed, which caused terraform to not be able to connect to the cluster (duh...) using the provider and such defaulted to 120.0.0.1 when trying to touch the config map...
As mentioned, I'm not sure exactly how it ended up in that state, but it got so bad that I'd get this dial tcp 127.0.0.1:80: connect: connection refused
error on terraform plan
even with all references to the config map removed. Turns out there was still a reference to the config map in the state file, so removing that using terraform state rm module.eks.this.kubernetes_config_map_v1_data.aws_auth
allowed me to redeploy...
Maybe not applicable to most of you, but hopefully it's useful for someone in the future :D
hey all - let me know if its still worthwhile to leave this issue open. I don't think there is anything further we can do here in this module to help alleviate any of the issues shown - there seems to be some variability in terms of what works or does not work for folks. I might be biased, but I think the best place to look at sourcing some improvements/resolution would be upstream with the other providers (Kubernetes, Helm, Kubectl, etc.)
I'm also experiencing this, in the meantime are there any work arounds?
Im experiencing the same problem with the latest version. Initial creation of cluster worked fine but trying to update any resources after creation i get the same error.
β Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp 127.0.0.1:80: connect: connection refused
β
β with module.eks.kubernetes_config_map_v1_data.aws_auth[0],
β on .terraform/modules/eks/main.tf line 431, in resource "kubernetes_config_map_v1_data" "aws_auth":
β 431: resource "kubernetes_config_map_v1_data" "aws_auth" {
β
Same as the example below except i had multiple profiles on my machine and had to specify the profile. https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/examples/eks_managed_node_group/main.tf#L5-L15
provider "kubernetes" {
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
exec {
api_version = "client.authentication.k8s.io/v1alpha1"
command = "aws"
# This requires the awscli to be installed locally where Terraform is executed
args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id, "--profile", "terraformtest"]
}
}
Faced the same, then checked state using terraform state list
and found k8s related entries there.
Then I removed then using
terraform state rm module.eks.kubernetes_config_map.aws_auth[0]
And that helped to resolve the issue.
The previous suggestions didin't work for me (maybe i misunderstood something)
- export KUBE_CONFIG_PATH=$PWD/kubeconfig
This kubeconfig does not appear to exist in my current path...
- Deleting the datasource
The latest version of this example and module does not use a datasource, instead just uses module.eks.cluster_id
but still get this error.
i ended up deleting the aws_auth from the state, it allowed me to continue/resolve the connection refused problem.
terraform state rm 'module.eks.kubernetes_config_map_v1_data.aws_auth[0]'
I don't know what the implications of rm'ing this state has, is it safe to keep removing this state whenever we encounter this error?.
a brand new cluster and tf state, eks 1.22
terraform {
required_version = ">= 1.1.8"
required_providers {
aws = {
source = "hashicorp/aws"
version = ">= 4.9"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = ">= 2.10"
}
kubectl = {
source = "gavinbunney/kubectl"
version = ">= 1.13.1"
}
}
}
provider "aws" {
alias = "without_default_tags"
region = var.aws_region
assume_role {
role_arn = var.assume_role_arn
}
}
provider "kubernetes" {
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
exec {
api_version = "client.authentication.k8s.io/v1alpha1"
command = "aws"
# This requires the awscli to be installed locally where Terraform is executed
args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
}
}
locals {
## strips 'aws-reserved/sso.amazonaws.com/' from the AWSReservedSSO Role ARN
aws_iam_roles_AWSReservedSSO_AdministratorAccess_role_arn_trim = replace(one(data.aws_iam_roles.AWSReservedSSO_AdministratorAccess_role.arns), "/[a-z]+-[a-z]+/([a-z]+(\\.[a-z]+)+)\\//", "")
aws_auth_roles = concat([
{
rolearn = data.aws_iam_role.terraform_role.arn
username = "terraform"
groups = ["system:masters"]
},
{
rolearn = local.aws_iam_roles_AWSReservedSSO_AdministratorAccess_role_arn_trim
username = "sre"
groups = ["system:masters"]
}
],
var.aws_auth_roles,
)
}
# aws-auth configmap
create_aws_auth_configmap = var.self_managed_node_groups != [] ? true : null
manage_aws_auth_configmap = true
aws_auth_roles = local.aws_auth_roles
aws_auth_users = var.aws_auth_users
aws_auth_accounts = var.aws_auth_accounts
leads to:
β Error: Unauthorized
β
β with module.eks.module.eks.kubernetes_config_map.aws_auth[0],
β on .terraform/modules/eks.eks/main.tf line 414, in resource "kubernetes_config_map" "aws_auth":
β 414: resource "kubernetes_config_map" "aws_auth" {
any ideas @bryantbiggs ? thanks in advance.
@FernandoMiguel I'm seeing something similar in a configuration I'm working with. After some time of thought I believe you'll need to add the Assumed role to your configuration
provider "aws" {
alias = "without_default_tags"
region = var.aws_region
assume_role {
role_arn = var.assume_role_arn
}
}
provider "kubernetes" {
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
exec {
api_version = "client.authentication.k8s.io/v1alpha1"
command = "aws"
# This requires the awscli to be installed locally where Terraform is executed
args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id,"--role", var.assume_role_arn]
}
}
Sadly this isn't a solution for me. The configuration I'm working with uses dynamic credentials fed in.
Something along the lines...
provider "aws" {
access_key = <access_key>
secret_key = <secret_key>
token = <token>
region = <region>
}
This is useful if doing something where a temporary vm or container or tfe is running the terraform execution
Going down this route the provider is getting fed the information for connection and used entirely within the provider context (no aws config process was ever used).
The problem is none of that data is stored or carried over, so when the kubernetes
provider tries to run the exec
it's going to default to the methods the aws cli
uses (meaning a locally store config in ~/.aws/config
or ~/.aws/credentials
). In my case that doesn't exist.
@FernandoMiguel it looks like your are presumably using a ~/.aws/config
, so passing the assumed role and possibly the profile (if not using a default) should help move that forward. I cannot guarantee it will fix it, but that would be the theory.
No config and no aws creds hardcoded. Everything is assume role from a global var. This works on hundreds of our projects.
If you mean the cli exec, that's running from aws-vault exec --server
@FernandoMiguel Hmm well that's interesting. I was able to get a solution to work for me.
provider "kubernetes" {
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
exec {
api_version = "client.authentication.k8s.io/v1alpha1"
command = "aws-iam-authenticator"
# This requires the awscli to be installed locally where Terraform is executed
args = ["token", "-i, module.eks.cluster_id]
}
}
This seemed to work for me, but I also had to expose my endpoint to be public for the first run. Our network configuration was locked down too tightly for our remote execution server to hit the endpoint. That could be something else you make sure you are hitting.
If you mean the cli exec, that's running from aws-vault exec --server
What I meant was if credentials are being passed to the aws
provider than I would necessarily see them being passed to the kubernetes
provider. Some trouble shooting you could try it TF_LOG=debug terraform plan ...
in order to get more information if you haven't tried that. If you really wanted to test if the kubernetes exec works spin up a vm or container pass the credentials and see if that carries over.
If my guess it correct than a way around it would be creating a ~/.aws/credentials
file using a null resource and template out configuration that aws eks get-token can then reference.
The thought process I am having is the data being passed into the kubernetes provider contains no information about aws configuration. So I would expect it to fail if the instance running the terraform didn't have the aws cli configured.
Further thought if the remote execution tool being used doesn't have an ~/.aws/config but running inside an instance with an IAM role attached to it. Then it would default to that IAM role, so then it could still work as long as that IAM role has the ability to assume the role.
@bryantbiggs I think the thought process I had from above just reassures your comment. I don't think there is anything in this module that can be done to fix this. I do have a suggestion of not completely remove the aws_auth_configmap_yaml output unless you have other solutions coming up. The reasoning is I could see a use case where terraform is ran to provision private cluster which may or may not be running on an instance that can reach that endpoint. If it isn't the aws_auth_configmap_yaml can be used in a completely separate process to hit the private cluster endpoint. It all depends on how separation of duties may come into play (a person to provision, and maybe a person to configure). It's just a thought.
I would love to know what isn't working here. I spent a large chunk of this week trying every combo I could think to get this to work, without success. Different creds for the kube provider, different parallelism settings, recreating the code outside of the module so it would run after the eks cluster module had finished, etc.. I would always get either authentication error, that the config map didn't exist or that it couldn't create it. Very frustrating.
If we were to keep the now deprecated output, I can at least revert my internal PR and keep using that old and terrible null exec code to patch the config map.
The problem might be the terraform-provider-kubernetes and not terraform-aws-eks, eg. https://github.com/hashicorp/terraform-provider-kubernetes/issues/1479, ... more about localhost connection refused. This one can really be difficult to catch.
@tanvp112 you are onto something there
we have this provider
notice the highlight bit
that is not available until the cluster is up
so it is possible that this provider is getting initialised with the wrong endpoint
maybe even "localhost"
and ofc that explains why auth fails
explains why the 2nd apply works fine, cause now the endpoint is correct
So my issue was with authentication, and I believe this example clearly states the issue.
The example state that you must set AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
. Doing a little more digging and for those having issues with authentication could try something like this.
provider "kubernetes" {
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
exec {
api_version = "client.authentication.k8s.io/v1alpha1"
# This would set up the aws cli configuration if there is no config or credential file running on the host that would run the aws cli command
env = {
AWS_ACCESS_KEY_ID = var.access_key_id
AWS_SECRET_ACCESS_KEY = var.secret_access_key
AWS_SESSION_TOKEN = var.token
}
# This requires the awscli to be installed locally where Terraform is executed\
command = "aws"
args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
}
}
I haven't gotten to try this myself, but it should work. The AWS_SESSION_TOKEN would only be needed for an assumed role process, but it could possibly work.
So my issue was with authentication, and I believe this example clearly states the issue.
The example state that you must set
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
. Doing a little more digging and for those having issues with authentication could try something like this.provider "kubernetes" { host = module.eks.cluster_endpoint cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data) exec { api_version = "client.authentication.k8s.io/v1alpha1" # This would set up the aws cli configuration if there is no config or credential file running on the host that would run the aws cli command env = { AWS_ACCESS_KEY_ID = var.access_key_id AWS_SECRET_ACCESS_KEY = var.secret_access_key AWS_SESSION_TOKEN = var.token } # This requires the awscli to be installed locally where Terraform is executed\ command = "aws" args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id] } }
I haven't gotten to try this myself, but it should work. The AWS_SESSION_TOKEN would only be needed for an assumed role process, but it could possibly work.
I honestly don't know what you are trying to do... aws iam auth can be done in many ways. not everyone has a dedicated IAM account... we use assume roles, for ex.
I honestly don't know what you are trying to do... aws iam auth can be done in many ways. not everyone has a dedicated IAM account... we use assume roles, for ex.
When you assume a role your retrieve an temporary access key, secret key, and token. My code snippet is an example for when a user is running things in a jobbed off process inside of a container. Where the container contains no context for AWS (no config
or credentials
file). That is my use case where my runs are an isolated instance that does not persist (Terraform Cloud follows this same structure, but does not have aws
installed by default), and run in a CICD pipeline fashion not on a local machine.
When the aws provider is used the configuration information is is passed into the provider for this example. (I'm making it simple. My context actually uses dynamic credential by using hashicorp vault, but don't want to introduce that complexity in this explanation.)
provider "aws" {
region = "us-east-1"
access_key = "<access key | passed via variable or some data query>"
secret_key = "<secret access key | passed via variable or some data query>"
token = "<session token | passed via variable or some data query>"
}
In this instance the AWS Provider has all information passed in and using the Provider Configuration method. On this run no local aws config file or environment variables exist, so it needs this to make any aws connection.
All aws resources create successfully in this process, besides that aws-auth configmap, when using the suggested example.
provider "kubernetes" {
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
exec {
api_version = "client.authentication.k8s.io/v1alpha1"
# This requires the awscli to be installed locally where Terraform is executed\
command = "aws"
args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
}
The reason this is failing is the Kubernetes provider has no context on what you use for the aws command because no config or environment variables are being used. Therefore this will fail
- NOTE: This will also fail if you have a local AWS Config loaded using a config file or environment variable that does not run as the same role as the EKS cluster was created. The only auth by default is the user or role that created the cluster. So if the local user cannot assume the role used with the above aws provider. The kubernetes commands will fail as well.
That is how the suggested route came to be.
provider "kubernetes" {
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
exec {
api_version = "client.authentication.k8s.io/v1alpha1"
# This would set up the aws cli configuration if there is no config or credential file running on the host that would run the aws cli command
env = {
AWS_ACCESS_KEY_ID = "<same access key passed to aws provider | passed via variable or some data query>"
AWS_SECRET_ACCESS_KEY = "<same secret access key passed to aws provider | passed via variable or some data query>"
AWS_SESSION_TOKEN = "<same session token passed to aws provider | passed via variable or some data query>"
}
}
# This requires the awscli to be installed locally where Terraform is executed\
command = "aws"
args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
}
}
In this provider block it is purposely passing in the required credential/configuration needed for the aws cli to successfully call aws eks get-token --cluster-name <cluster name>
. Because the kubernetes provider does not care what was passed in to the aws provider. There is no shared context because there is no local configuration file or environment variables being leveraged.
@FernandoMiguel does this make sense on what I was trying to attain now? This may not be your use case, but it is useful information for anyone trying to run this module using some external remote execution tool.
I'm going to add this module does not contain the issue, but adding the above snippet to the documentation may help out those that may be purposely providing configuration to the aws provider vs utilizing Environment variables or local config files.
In this provider block it is purposely passing in the required credential/configuration needed for the aws cli to successfully call
aws eks get-token --cluster-name <cluster name>
. Because the kubernetes provider does not care what was passed in to the aws provider. There is no shared context because there is no local configuration file or environment variables being leveraged.@FernandoMiguel does this make sense on what I was trying to attain now? This may not be your use case, but it is useful information for anyone trying to run this module using some external remote execution tool.
it does. I've been fighting issued using the kube provider for weeks with what seems a race condition or failed to initialise endpoint/creds. Sadly, in our case, your snippet does not help since creds are already available via metadata endpoint. but it's a good idea to always double check if CLI tools are using the expected creds.
I was having the same issue but the solution that worked for me is to configure the kubernetes provider to use the role, something like this:
provider "kubernetes" {
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
exec {
api_version = "client.authentication.k8s.io/v1alpha1"
command = "aws"
# This requires the awscli to be installed locally where Terraform is executed
args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id, "--role", "arn:aws:iam::${AWS_ACCOUNT_ID}:role/${ROLE_NAME}" ]
}
}
I was having the same issue but the solution that worked for me is to configure the kubernetes provider to use the role, something like this:
provider "kubernetes" { host = module.eks.cluster_endpoint cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data) exec { api_version = "client.authentication.k8s.io/v1alpha1" command = "aws" # This requires the awscli to be installed locally where Terraform is executed args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id, "--role", "arn:aws:iam::${AWS_ACCOUNT_ID}:role/${ROLE_NAME}" ] } }
Ohh that's an interesting option... Need to try that
I have the same issue, but like this:
Post "http://localhost/api/v1/namespaces/kube-system/configmaps": dial tcp [::1]:80: connect: connection refused
when i set "manage_aws_auth_configmap = true" when deploy eks managed group. Is there a decision how to solve it?