atlantis icon indicating copy to clipboard operation
atlantis copied to clipboard

assume_role does not work

Open chris1248 opened this issue 5 years ago • 22 comments

The docs on installing and configuring Atlantis are really super thin, and useless. As a result I try to configure assume_role, which then fails at runtime:

Problem

In my Pull Request on Bitbucket server I type this comment

atlantis plan -d sftp

and then I get back the following result:

running "/atlantis-data/bin/terraform0.12.16 plan -input=false -refresh -no-color -out \"/atlantis-data/repos/DevOps/terraform-prod/1/default/sftp/default.tfplan\"" in "/atlantis-data/repos/DevOps/terraform-prod/1/default/sftp": exit status 1
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.


------------------------------------------------------------------------

Error: The role "arn:aws:iam::<account>:role/myrole" cannot be assumed.

  There are a number of possible causes of this - the most common are:
    * The credentials used in order to assume the role are invalid
    * The credentials do not have appropriate permission to assume the role
    * The role ARN is not valid

  on connections.tf line 1, in provider "aws":
   1: provider "aws" {

I am specifying my provider in terraform like this:

provider "aws" {
  region = "us-west-2"
  assume_role {
    role_arn     = "arn:aws:iam::<AccountID>:role/myrole"
  }
}

The docs https://www.runatlantis.io/docs/provider-credentials.html#aws-specific-info

are pretty useless here and do not provide a clear path forward on how to configure this.

chris1248 avatar Nov 25 '19 19:11 chris1248

Hi, Sorry for the poor docs. It might help to ignore Atlantis and think about what you would need to do to be able to execute terraform yourself (wherever Atlantis is running). In the end, Atlantis just shells out to sh and runs terraform.

  • For example, say we're running on an EC2 instance
  • We'd look at the docs for the Terraform provider (https://www.terraform.io/docs/providers/aws/index.html#authentication)
  • Assuming we don't have Static credentials, Environment variables, Shared credentials file then it's going to use EC2 Role (based on the docs)
  • Based on the EC2 role docs, we see that "Terraform will just ask the metadata API endpoint for credentials". So that means our EC2 instance will need to be given an IAM role that has permissions to assume the role that you've specified.

Where are you running Atlantis? Can you ssh there and run terraform plan yourself? If not, what would it take to get that working based on Terraform's docs?

lkysow avatar Nov 26 '19 01:11 lkysow

Fyi, I use aws providers with assumed roles just fine with Atlantis running in a Fargate container with a task role that has permission to assume the needed roles.

grimm26 avatar Jan 24 '20 00:01 grimm26

@grimm26 were you able to enforce a specific session-name so that you can add it as a principal ?

mbolek avatar Jan 29 '20 09:01 mbolek

@mbolek I did not try that

grimm26 avatar Jan 29 '20 17:01 grimm26

@grimm26 would you mind sharing some info on how you use it? Seems I've got a somewhat similar case. My atlantis is assuming a role in VPC A and then trying to assume role in VPC B. I can configure principle in VPC B but it has to be specific and it has a session-name in the ARN, which seems is generated by atlantis container

mbolek avatar Jan 29 '20 17:01 mbolek

Atlantis runs in account A. It has a task role that with a policy that allows an action of "sts:AssumeRole" against a list of Resources that contains the arn for the role that I want atlantis to run under in Account B. Account B has that role defined, I attached the AdministratorAccess policy and it trusts Account A root. I assume you are trying to add a condition to make sure only Account A Atlantis role can assume the Account B atlantis role?

grimm26 avatar Jan 29 '20 17:01 grimm26

Yes, exactly, but might end up with root :) thanks!

mbolek avatar Jan 29 '20 19:01 mbolek

We ran into some assumrole issues recently running Atlantis on EKS and leveraging custom workflows for the assumeroles. We ended up tricking the AWS SDK into thinking it's not running in AWS by defining AWS_METADATA_URL environment variable to some fake endpoint export AWS_METADATA_URL="http://localhost/not/existent/url" so it would properly prioritize ~/.aws/credentials

See this comment for the AWS provider for more info: https://github.com/terraform-providers/terraform-provider-aws/issues/5018#issuecomment-549601523

sarcasticadmin avatar May 08 '20 01:05 sarcasticadmin

@grimm26 I'm having the exact same issue as @mbolek and keep going in circles looking at the different pieces of the configuration to get this to work...

Would you please be able to provide feedback on this configuration? Here is a quick rundown of the configuration I have (I've tried quite a few variations) :

Policy assigned to Atlantis Fargate container task execution role, in account 111111111111

...
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::*:role/AtlantisRole"
        }
...

Role Trust Policy in the Tenant account 222222222222 (where I want Atlantis to create resources) :

{
  "Version": "2008-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::111111111111:role/atlantis-ecs_task_execution"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Terraform provider block

provider "aws" {
  region = "ca-central-1"
  version = "~> 3.2"
  assume_role {
    role_arn     = "arn:aws:iam::222222222222:role/AtlantisRole"
  }
}

Terraform backend block

terraform {
  required_version = "0.13.0"
  backend "s3" {
    bucket   = "terraform-remote-state-222222222222"
    key      = "tf-atlantis-testing/terraform.state"
    region   = "ca-central-1"
    dynamodb_table = "terraform-remote-state-locking-222222222222"
    role_arn = "arn:aws:iam::222222222222:role/AtlantisRole"
  }
}

Any guidance on how you get this to work or any differences you see between our configs would be appreciated.

Thanks

Marcvd316 avatar Aug 14 '20 20:08 Marcvd316

@grimm26 I'm having the exact same issue as @mbolek and keep going in circles looking at the different pieces of the configuration to get this to work...

Would you please be able to provide feedback on this configuration? Here is a quick rundown of the configuration I have (I've tried quite a few variations) :

Policy assigned to Atlantis Fargate container task execution role, in account 111111111111

...
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::*:role/AtlantisRole"
        }
...

The above should be your ECS task role, not the task execution role. Maybe hyou have it right and just used the wrong nomenclature here. It is confusing :).

Role Trust Policy in the Tenant account 222222222222 (where I want Atlantis to create resources) :

{
  "Version": "2008-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::111111111111:role/atlantis-ecs_task_execution"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

ok, but again, you should be referencing the task role, the one that you feed into the task_role_arn argument in the aws_ecs_task_definition resource for the atlantis task.

The next two snippets below look fine, assuming that you called the role taht you want atlantis to assume AtlantisRole :). I think this is probably a question of confusion between task execution role (holds the permissions to actually start the task like executing ECS container, reading secrets to feed into the container) and the task role that the task gains permissions from, like an ec2 instance profile.

Terraform provider block

provider "aws" {
  region = "ca-central-1"
  version = "~> 3.2"
  assume_role {
    role_arn     = "arn:aws:iam::222222222222:role/AtlantisRole"
  }
}

Terraform backend block

terraform {
  required_version = "0.13.0"
  backend "s3" {
    bucket   = "terraform-remote-state-222222222222"
    key      = "tf-atlantis-testing/terraform.state"
    region   = "ca-central-1"
    dynamodb_table = "terraform-remote-state-locking-222222222222"
    role_arn = "arn:aws:iam::222222222222:role/AtlantisRole"
  }
}

Any guidance on how you get this to work or any differences you see between our configs would be appreciated.

Thanks

grimm26 avatar Aug 14 '20 21:08 grimm26

Hi, @grimm26 and @mbolek I have a follow up question : Where do you keep the state files for the different AWS accounts? Are they all in buckets that exist in a single central account, or are they stored in a bucket in each account?

I think this is where my problem is, I'm trying to store the state files in each account that is managed by Atlantis, rather than a central bucket, which I know works properly.

Thanks

Marcvd316 avatar Aug 18 '20 14:08 Marcvd316

@Marcvd316 I use a state bucket per account. I actually find it simpler that way.

grimm26 avatar Aug 18 '20 14:08 grimm26

@Marcvd316 I use a state bucket per account. I actually find it simpler that way.

Thanks for the answer. But to clarify, are all the state buckets in a single account or do you have them in each of the accounts?

What I'm trying to do is keep each account's state bucket in the account itself, not a central account.

Marcvd316 avatar Aug 18 '20 19:08 Marcvd316

each of my accounts has its own terraform state bucket in its own account.

grimm26 avatar Aug 18 '20 19:08 grimm26

each of my accounts has its own terraform state bucket in its own account.

Then your setup sounds very similar to ours.... Could I ask you to post the same configurations I posted above?

  • terraform provider block (with versions)
  • terraform backend block
  • Fargate task role policy
  • Role Trust policy in the Tenant accounts
  • any other relevant info (S3 bucket policies?)

There must be a tiny difference that allows yours to work and doesn't for ours...

Thanks for your help!

Marcvd316 avatar Aug 18 '20 19:08 Marcvd316

each of my accounts has its own terraform state bucket in its own account.

@grimm26 How do you make something like that work? We use workspaces and as an example, my terraform block looks like this

terraform {
  backend "s3" {
    bucket         = "my-bucket-name"
    key            = "terraform/aws/blah..."
    region         = "us-west-2"
    dynamodb_table = "lock-table..."
    role_arn       = "arn:aws:iam::112233:role/terraform-role"
  }
}

But if I try to put an account ID as a variable in there, like

role_arn       = "arn:aws:iam::${local.account_id}:role/terraform-role"

Terraform complains with

Error: Variables not allowed

So how do you use different buckets per account without duplicating your main.tf file?

cep21 avatar Dec 21 '20 23:12 cep21

Backend configs do not support interpolation in terraform but the provider does.

https://registry.terraform.io/providers/hashicorp/aws/latest/docs#assume-role

jamengual avatar Dec 21 '20 23:12 jamengual

Backend configs do not support interpolation in terraform but the provider does

I also do that, but it's still unclear how you can give each account a different s3 bucket since the backend is static and includes the bucket name.

cep21 avatar Dec 22 '20 00:12 cep21

I understand but this is not a function of atlantis, this is how terraform is used.

you will have to run TF something like this :

terraform init
-var-file=staging-us-west-2.tfvars
-backend-config=staging-backend-us-west-2.tfvars

and replicate that as a workflow in atlantis with a custom atlantis.yaml file.

similar to this :

workflows:
production-us-east-2:
plan:
steps:
- run: rm -rf .terraform
- init:
extra_args: [-backend-config=production-backend-us-east-2.tfvars]
- plan:
extra_args: [-var-file=production-us-east-2.tfvars]

jamengual avatar Dec 22 '20 00:12 jamengual

I don't use workspaces. Each AWS account that I have has its own git repository with the terraform backend hard coded.

each of my accounts has its own terraform state bucket in its own account.

@grimm26 How do you make something like that work? We use workspaces and as an example, my terraform block looks like this

terraform {
  backend "s3" {
    bucket         = "my-bucket-name"
    key            = "terraform/aws/blah..."
    region         = "us-west-2"
    dynamodb_table = "lock-table..."
    role_arn       = "arn:aws:iam::112233:role/terraform-role"
  }
}

But if I try to put an account ID as a variable in there, like

role_arn       = "arn:aws:iam::${local.account_id}:role/terraform-role"

Terraform complains with

Error: Variables not allowed

So how do you use different buckets per account without duplicating your main.tf file?

grimm26 avatar Dec 22 '20 00:12 grimm26

atlantis plan

anujclari avatar Sep 16 '22 15:09 anujclari

atlantis plan

anujclari avatar Sep 16 '22 15:09 anujclari

I have role assumption working. Its not a limitation with atlantis. If you create a primary role for atlantis that it assumes by default then create secondary roles in each account that allow the primary role to assume it, then you can use the assume_role block's role_arn to assume each secondary role.

Amazon has this documented under role chaining

https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_terms-and-concepts.html

nitrocode avatar Feb 10 '23 05:02 nitrocode