atlantis
atlantis copied to clipboard
terraform lock file committed on arm, linux amd deploy, init command throws error with tf 0.14
I ran into the following issue:
running "/atlantis/data/bin/terraform0.14.6 init -input=false -no-color -upgrade" in "/atlantis/data/repos/tunein/atlantis/16/default/deploy/environments/production": exit status 1
Initializing the backend...
Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.
Initializing provider plugins...
- Finding hashicorp/aws versions matching "3.28.0"...
- Using hashicorp/aws v3.28.0 from the shared cache directory
Error: Failed to install provider from shared cache
Error while importing hashicorp/aws v3.28.0 from the shared cache directory:
the provider cache at .terraform/providers has a copy of
registry.terraform.io/hashicorp/aws 3.28.0 that doesn't match any of the
checksums recorded in the dependency lock file.
and after looking up the error (https://www.terraform.io/docs/cli/commands/providers/lock.html)
I think that -upgrade
is the problem here, but I can't be sure.
changing the workflow to look like this:
+ "workflows": {
+ "default": {
+ "apply": {
+ "steps": [
+ "apply"
+ ]
+ },
+ "plan": {
+ "steps": [
+ {
+ "run": "terraform init -input=false -no-color"
+ },
+ "plan"
+ ]
+ }
+ },
I now get this:
exit status 1: running "terraform init -input=false -no-color" in "/atlantis/data/repos/tunein/atlantis/16/default/deploy/environments/production":
Error: Unsupported Terraform Core version
on main.tf line 15, in terraform:
15: required_version = "0.14.6"
This configuration does not support Terraform version 0.13.0. To proceed,
either choose another supported Terraform version or update this version
constraint. Version constraints are normally set for good reason, so updating
the constraint may lead to other errors or unexpected behavior.
despite having my .atlantis.yaml
set as:
projects:
- name: production
dir: ./deploy/environments/production
terraform_version: 0.14.6
and main.tf
with:
terraform {
...
required_providers {
aws = {
source = "hashicorp/aws"
version = "3.28.0"
}
}
required_version = "0.14.6"
}
I was able to fix the version issue by changing the workflow to terraform${ATLANTIS_TERRAFORM_VERSION} init -input=false -no-color
. I think this needs to be better called out in the documentation, right now, it makes it seem that simply using terraform
in a custom workflow will do the right thing, but it won't.
if you use extra-args it'll use the version you specify in your atlantis.yaml.
referencing the binary directly in a custom run command doesn't work if you're using the non-default. We can make this clearer in our docs.
+1 on this issue as we encountered it with Terraform v0.14. Seems like the two workaround are:
- Do not commit
.terraform.lock.hcl
file to repo. This will cause atlantis to always pull down the latest version of providers during init - Custom workflow that provides
extra_args: ["-upgrade", "false"]
to the init step. This will cause atlantis to respect the.terraform.lock.hcl
file if it exists.
I am going with custom workflow method for now..
Edit: This comment https://github.com/runatlantis/atlantis/issues/1408#issuecomment-1016657651 identified the issue/fix
if you use extra-args it'll use the version you specify in your atlantis.yaml.
referencing the binary directly in a custom run command doesn't work if you're using the non-default. We can make this clearer in our docs.
Are extra args deduplicated? Such that if I specify an argument that is already a default (but with a different value), are they both passed to terraform? Or does last arg win?
Are extra args deduplicated? Such that if I specify an argument that is already a default (but with a different value), are they both passed to terraform? Or does last arg win?
They don't seem to get dedup. Adding extra_args: ["-upgrade", "false"]
duplicates the -upgrade
flag.
"/atlantis/bin/terraform0.14.7 init -input=false -no-color -upgrade -upgrade=false"
To follow up on this one... with fix https://github.com/runatlantis/atlantis/pull/1651 the -upgrade
flag is deduped, however, atlantis will still fail with the error below (that is, when the .terraform.lock.hcl
is committed):
Error: Failed to install provider from shared cache
For what it's worth, I ran into the same issue, and it appears the root cause of the issue is that the terraform lock file was generated in OS X but Atlantis was running in linux_amd64?
Running the following line added extra checksums for the linux_amd64 version of the providers:
terraform providers lock -platform=linux_amd64
After committing and pushing this change to the lockfile, Atlantis is happy to use the cached version of the provider and runs without issues.
(I discovered this thanks to https://zenn.dev/shonansurvivors/scraps/7dd3ab1188c956 – I assume this is the same issue based on error messages and the step to fix it, even though I don't read Japanese 😄 )
For what it's worth, I ran into the same issue, and it appears the root cause of the issue is that the terraform lock file was generated in OS X but Atlantis was running in linux_amd64?
Running the following line added extra checksums for the linux_amd64 version of the providers:
terraform providers lock -platform=linux_amd64
After committing and pushing this change to the lockfile, Atlantis is happy to use the cached version of the provider and runs without issues.
(I discovered this thanks to https://zenn.dev/shonansurvivors/scraps/7dd3ab1188c956 – I assume this is the same issue based on error messages and the step to fix it, even though I don't read Japanese 😄 )
Thanks @Pluies -- that was our issue. And it was the sole reason we were using custom workflows for all of our root modules ... and custom workflows don't work with the new streaming output in the Atlantis UI. So now, we can have our 🍰 and 😮💨 it too 😄
It's possible to generate the checksums for multiple architectures in a single go, such that lockfiles will work with old and new macs, intel and amd (Graviton) instances. I added a script terraform_lockfile.sh
to our repo like this:
#!/usr/bin/env bash
#
# Generates .terraform.lock.hcl file having hashes for each architecture we run on
# https://www.terraform.io/cli/commands/providers/lock
terraform providers lock -platform=darwin_arm64 -platform=darwin_amd64 -platform=linux_amd64 -platform=linux_arm64
Sounds like the workaround is to either
- do not commit the lock file
- if it is committed, lock it for the platform that Atlantis is deployed to and the platforms terraform workflows are run on locally (e.g. local m1 laptops)
Thanks for everyone investigating this and coming up with a solution that works.
It would be nice to create a new doc to mention how to commit this file properly.
Just chiming in - we're not vendoring/committing the lockfiles and we're still running into this.
Workaround is to delete the plugin cache dir or vendor/commit the lockfile with the platform atlantis is running on (+ any local envs etc)
is there a regression on this workaround for v0.25.0
?
Ref:
- https://github.com/runatlantis/atlantis/issues/3201
I tried upgrading (listing all changes to highlight the issue seems related to v0.25.0
):
Atlantis. | Terraform | TF provider AWS | ||
---|---|---|---|---|
from | v0.24.4 |
v1.5.4 |
~> v4 |
✅ |
to | v0.25.0 |
v1.5.7 |
~> v5 |
❌ |
revert | v0.24.4 |
v1.5.7 |
~> v5 |
✅ |
I am using this in my atlantis.env
snippet
# Atlantis issues with TF 1.4+
# https://github.com/runatlantis/atlantis/issues/3201
TF_PLUGIN_CACHE_MAY_BREAK_DEPENDENCY_LOCK_FILE=true
# ...
Note: I run atlantis in a systemd unit on an EC2 instance, no container / no k8s configmaps or secrets and everything works in
v0.24.4
I do consider finding out a way to make sure the terraform lock files are committed (we run across windows/linux/mac and amd64/arm64 machines so we're not comiting lock files yet, but if anyone has some type of pre-commit checks that help validate the lock file, I'll make sure the lock files are added to resolve this issue instead.
the only change log entries mentioning lock files for v0.25.0
release seems to be:
- https://github.com/runatlantis/atlantis/pull/3473
I was storing the plugin-cache
on an EBS volume and while doing provider upgrades, there would be issues with the versions in there.
So perhaps there's no regression and I just had to rm -rf
the plugin-cache and force a new copy running terraform init