terraform-aws-control_tower_account_factory
terraform-aws-control_tower_account_factory copied to clipboard
creds.sh An error occurred (ValidationError) when calling the AssumeRole operation arn::iam::XXXXXXXXXXX:role/AWSAFTAdmin is invalid
Terraform Version >= 0.15.1 & Prov: >= 3.72, < 4.0.0
AFT Version: 1.3.3
(Can be found in the AFT Management Account in the SSM Parameter /aft/config/aft/version)
Terraform Version & Provider Versions N/A
terraform version
N/A
terraform providers
N/A
Bug Description This Bug is as of 08/03/2022
When running the aft-create-pipeline CodeBuild project, it fails with the error:
[Container] 2022/08/03 21:52:46 Running command ./aws-aft-core-framework/sources/scripts/creds.sh --aft-mgmt
Generating credentials for AWSAFTAdmin in aft-management account: XXXXXXXXXXX
An error occurred (ValidationError) when calling the AssumeRole operation: arn::iam::XXXXXXXXXXX:role/AWSAFTAdmin is invalid
[Container] 2022/08/03 21:52:50 Command did not exit successfully ./aws-aft-core-framework/sources/scripts/creds.sh --aft-mgmt exit status 255
Note the ARN for the AWS AFT Admin role is missing the AWS partition key and should read "arn:aws:iamXXXXXXXXXXX:role/AWSAFTAdmin." This is causing the Validation Error for the Assume Role operation in the creds.sh script. Our parameter store was pointing to the "main" branch in the "aws-ia/terraform-aws-control_tower_account_factory" repository using the repo URL and the repo branch as the parameters. This is because the creds.sh is using the AWS partition environment variable rather than hard coding "aws." There was a commit on July 19th where "aws" was replaced with "${AWS_PARTITION}" and our builds have been failing soon after that commit.
creds.sh is in terraform-aws-control_tower_account_factory/sources/scripts/creds.sh
A workaround for this issue is to fork the repo and replace the ${AWS_PARTITION} environment variable with "aws."
For example:
CREDENTIALS=$(aws sts assume-role --role-arn "arn:${AWS_PARTITION}:iam::${AFT_MGMT_ACCOUNT}:role/${AFT_MGMT_ROLE}" --role-session-name "${ROLE_SESSION_NAME}")
SHOULD BE CHANGED TO:
CREDENTIALS=$(aws sts assume-role --role-arn "arn:aws:iam::${AFT_MGMT_ACCOUNT}:role/${AFT_MGMT_ROLE}" --role-session-name "${ROLE_SESSION_NAME}")
There are other spots where ${AWS_PARTITION} should be changed. This is just a work around until the environment variable is fixed.
To Reproduce Steps to reproduce the behavior:
- Check the parameter store for /aft/config/aft-pipeline-code-source/repo-url and ensure it's value is "https://github.com/aws-ia/terraform-aws-control_tower_account_factory.git"
- Check the parameter store for /aft/config/aft-pipeline-code-source/repo-git-ref and ensure it's value is "main"
- Go to CodeBuild > Build Projects > aft-create-pipeline > Start Build
- This should produce the error in the description above (as of 08/03/2022)
Expected behavior The aft-create-pipeline should run successfully
Related Logs
Additional context Add any other context about the problem here.
Go to CodeBuild > Build Projects > aft-create-pipeline > Build History 2. Click on the latest failed build 3. Check for the error listed in the description
Expected behavior A clear and concise description of what you expected to happen.
Related Logs [Container] 2022/08/03 21:52:46 Running command ./aws-aft-core-framework/sources/scripts/creds.sh --aft-mgmt
Generating credentials for AWSAFTAdmin in aft-management account: XXXXXXXXXXX
An error occurred (ValidationError) when calling the AssumeRole operation: arn::iam::XXXXXXXXXXX:role/AWSAFTAdmin is invalid
[Container] 2022/08/03 21:52:50 Command did not exit successfully ./aws-aft-core-framework/sources/scripts/creds.sh --aft-mgmt exit status 255
@abhishek-sorenson there is a known bug with AFT version 1.3.5 and older which causes AFT components to use the latest code in the AFT source repository instead of the version of AFT that was deployed.
I would recommend updating to latest version of AFT v 1.6.2 which would fix your issue.
@abhishek-sorenson there is a known bug with AFT version 1.3.5 and older which causes AFT components to use the latest code in the AFT source repository instead of the version of AFT that was deployed.
I would recommend updating to latest version of AFT v 1.6.2 which would fix your issue.
Is there guidance on which order to do things for updating? I started this process this morning after encountering this. I ran the updated terraform for account factory. Then I kicked off the aft-invoke-customizations step function with:
{
"include": [
{
"type": "all"
}
]
}
which proceeded to fail. Was there an order of operations I missed somewhere?
@andrewkruse @snebhu3 I also couldn't get that fix to work.
What ended up working was hardcoding the env variable AWS_PARTITION in aft-create-pipeline BuildSpec (this was the pipeline that was failing for me). I edited it via the Console, by going to CodeBuild > Build Projects > aft-create-pipeline > Edit Environment.
Not sure if this will fix all scenarios, but it allowed me to successfully run the other components of the project to provision/customize an account.
@andrewkruse @snebhu3 I also couldn't get that fix to work.
What ended up working was hardcoding the env variable
AWS_PARTITIONinaft-create-pipelineBuildSpec (this was the pipeline that was failing for me). I edited it via the Console, by going to CodeBuild > Build Projects > aft-create-pipeline > Edit Environment.Not sure if this will fix all scenarios, but it allowed me to successfully run the other components of the project to provision/customize an account.
Yup, this is probably a better workaround than what I'm doing currently as it explicitly defines the AWS_PARTITION environment variable rather than replacing that bit in the ARN altogether. I think the ultimate solution is:
- Use the workaround to make your code work as is
- Upgrade to the latest AFT version (make sure you test a deployment first!)
- Pull down the latest code from the aws-ia/terraform-aws-control_tower_account_factory
- Re-test the AFT version with a test deployment
If this doesn't work, then please report a bug with the latest version. I will try this myself sometime soon and provide an update.
@abhishek-sorenson there is a known bug with AFT version 1.3.5 and older which causes AFT components to use the latest code in the AFT source repository instead of the version of AFT that was deployed. I would recommend updating to latest version of AFT v 1.6.2 which would fix your issue.
Is there guidance on which order to do things for updating? I started this process this morning after encountering this. I ran the updated terraform for account factory. Then I kicked off the
aft-invoke-customizationsstep function with:{ "include": [ { "type": "all" } ] }which proceeded to fail. Was there an order of operations I missed somewhere?
Hey Andrew,
I'm not sure what customizations you kicked off, but please check if you're seeing the same error. The error in this thread in relation to the aft-create-pipeline and I believe the customizations step function calls global-customizations. Can you double-check the error you're getting? If it is the same, you just need to hardcode the environment variable for a workaround as @gabrielibagon suggested. For a full fix, follow the steps suggested in my prior post and see if it works.
Hey Andrew,
I'm not sure what customizations you kicked off, but please check if you're seeing the same error. The error in this thread in relation to the aft-create-pipeline and I believe the customizations step function calls global-customizations. Can you double-check the error you're getting? If it is the same, you just need to hardcode the environment variable for a workaround as @gabrielibagon suggested. For a full fix, follow the steps suggested in my prior post and see if it works.
It appears it was failing to schedule some of the pipelines because it has exceeded the amount allowable at once. Apparently my cap is set to 20, not 25. But after getting through all of them, it appears the codebuild projects have been updated to have the AWS_PARTITION variable in them and the code pipelines are using a working code build project.
We've got a separate issue to address the aft-create-pipeline concurrency throttling mentioned here, https://github.com/aws-ia/terraform-aws-control_tower_account_factory/issues/223
We do not recommend hard-coding the AWS_PARTITION variable on the customization CodeBuild jobs, as the account specific customization pipelines should be updated as a side effect of the aft-invoke-customizations Step Function, which would resolve the missing AWS_PARTITION environment variable issue.
@balltrev, okay we will give this a try. We recently tried upgrading AFT to the latest version and used the most updated repository for AFT and we still encountered this error. Do we need to run the account specific customizations pipelines to resolve this?
@balltrev, okay we will give this a try. We recently tried upgrading AFT to the latest version and used the most updated repository for AFT and we still encountered this error. Do we need to run the account specific customizations pipelines to resolve this?
@abhishek-sorenson After updating the AFT module via terraform, I had to run aft-invoke-customizations for each individual account id to make sure I didn't exceed a 25 concurrency limit. It ends up updating some code pipelines and some code build projects and then the next pipeline kick offs should work normally.
@andrewkruse Okay got it, we will give that a try. Thanks!
I'm closing this issue as we've reached resolution on the original report - please track the pipeline creation concurrency issues via https://github.com/aws-ia/terraform-aws-control_tower_account_factory/issues/223
Hi,
This bug is still active, please do not close it.
What is the resolution for this bug?
Which bug are you referring to?
The pipeline creation concurrency issue has not been resolved but should be tracked via https://github.com/aws-ia/terraform-aws-control_tower_account_factory/issues/223 .
As mentioned above, the symptoms in this ticket stem from improper partial upgrade of components due to a bug in AFT 1.3.5 - later versions are not exposed to this issue. Customers should not directly hardcode or configure the AWS_PARTITION environment variable, but instead upgrade to the latest AFT version and re-invoke the customization pipelines to ensure components are properly upgrade.