atlantis terraform lock file committed on arm, linux amd deploy, init command throws error with tf 0.14

I ran into the following issue:

running "/atlantis/data/bin/terraform0.14.6 init -input=false -no-color -upgrade" in "/atlantis/data/repos/tunein/atlantis/16/default/deploy/environments/production": exit status 1

Initializing the backend...

Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.

Initializing provider plugins...
- Finding hashicorp/aws versions matching "3.28.0"...
- Using hashicorp/aws v3.28.0 from the shared cache directory

Error: Failed to install provider from shared cache

Error while importing hashicorp/aws v3.28.0 from the shared cache directory:
the provider cache at .terraform/providers has a copy of
registry.terraform.io/hashicorp/aws 3.28.0 that doesn't match any of the
checksums recorded in the dependency lock file.

and after looking up the error (https://www.terraform.io/docs/cli/commands/providers/lock.html)

I think that -upgrade is the problem here, but I can't be sure.

Feb 12 '21 20:02 ghostsquad

changing the workflow to look like this:

+      "workflows": {
+        "default": {
+          "apply": {
+            "steps": [
+              "apply"
+            ]
+          },
+          "plan": {
+            "steps": [
+              {
+                "run": "terraform init -input=false -no-color"
+              },
+              "plan"
+            ]
+          }
+        },

I now get this:

exit status 1: running "terraform init -input=false -no-color" in "/atlantis/data/repos/tunein/atlantis/16/default/deploy/environments/production": 

Error: Unsupported Terraform Core version

  on main.tf line 15, in terraform:
  15:   required_version = "0.14.6"

This configuration does not support Terraform version 0.13.0. To proceed,
either choose another supported Terraform version or update this version
constraint. Version constraints are normally set for good reason, so updating
the constraint may lead to other errors or unexpected behavior.

despite having my .atlantis.yaml set as:

projects:
  - name: production
    dir: ./deploy/environments/production
    terraform_version: 0.14.6

and main.tf with:

terraform {
  ...

  required_providers {
    aws = {
      source = "hashicorp/aws"
      version = "3.28.0"
    }
  }

  required_version = "0.14.6"
}

Feb 12 '21 20:02 ghostsquad

I was able to fix the version issue by changing the workflow to terraform${ATLANTIS_TERRAFORM_VERSION} init -input=false -no-color. I think this needs to be better called out in the documentation, right now, it makes it seem that simply using terraform in a custom workflow will do the right thing, but it won't.

Feb 12 '21 20:02 ghostsquad

if you use extra-args it'll use the version you specify in your atlantis.yaml.

referencing the binary directly in a custom run command doesn't work if you're using the non-default. We can make this clearer in our docs.

Feb 25 '21 04:02 nishkrishnan

+1 on this issue as we encountered it with Terraform v0.14. Seems like the two workaround are:

Do not commit .terraform.lock.hcl file to repo. This will cause atlantis to always pull down the latest version of providers during init
Custom workflow that provides extra_args: ["-upgrade", "false"] to the init step. This will cause atlantis to respect the .terraform.lock.hcl file if it exists.

I am going with custom workflow method for now..

Edit: This comment https://github.com/runatlantis/atlantis/issues/1408#issuecomment-1016657651 identified the issue/fix

Mar 17 '21 16:03 bryankaraffa

if you use extra-args it'll use the version you specify in your atlantis.yaml.

referencing the binary directly in a custom run command doesn't work if you're using the non-default. We can make this clearer in our docs.

Are extra args deduplicated? Such that if I specify an argument that is already a default (but with a different value), are they both passed to terraform? Or does last arg win?

Mar 17 '21 20:03 ghostsquad

Are extra args deduplicated? Such that if I specify an argument that is already a default (but with a different value), are they both passed to terraform? Or does last arg win?

They don't seem to get dedup. Adding extra_args: ["-upgrade", "false"] duplicates the -upgrade flag.

"/atlantis/bin/terraform0.14.7 init -input=false -no-color -upgrade -upgrade=false"

Mar 19 '21 18:03 davidmontoyago

To follow up on this one... with fix https://github.com/runatlantis/atlantis/pull/1651 the -upgrade flag is deduped, however, atlantis will still fail with the error below (that is, when the .terraform.lock.hcl is committed):

Error: Failed to install provider from shared cache

Jul 20 '21 16:07 davidmontoyago

For what it's worth, I ran into the same issue, and it appears the root cause of the issue is that the terraform lock file was generated in OS X but Atlantis was running in linux_amd64?

Running the following line added extra checksums for the linux_amd64 version of the providers:

terraform providers lock -platform=linux_amd64

After committing and pushing this change to the lockfile, Atlantis is happy to use the cached version of the provider and runs without issues.

(I discovered this thanks to https://zenn.dev/shonansurvivors/scraps/7dd3ab1188c956 – I assume this is the same issue based on error messages and the step to fix it, even though I don't read Japanese 😄 )

Jan 19 '22 16:01 Pluies

For what it's worth, I ran into the same issue, and it appears the root cause of the issue is that the terraform lock file was generated in OS X but Atlantis was running in linux_amd64?

Running the following line added extra checksums for the linux_amd64 version of the providers:
terraform providers lock -platform=linux_amd64
After committing and pushing this change to the lockfile, Atlantis is happy to use the cached version of the provider and runs without issues.

(I discovered this thanks to https://zenn.dev/shonansurvivors/scraps/7dd3ab1188c956 – I assume this is the same issue based on error messages and the step to fix it, even though I don't read Japanese 😄 )

Thanks @Pluies -- that was our issue. And it was the sole reason we were using custom workflows for all of our root modules ... and custom workflows don't work with the new streaming output in the Atlantis UI. So now, we can have our 🍰 and 😮‍💨 it too 😄

It's possible to generate the checksums for multiple architectures in a single go, such that lockfiles will work with old and new macs, intel and amd (Graviton) instances. I added a script terraform_lockfile.sh to our repo like this:

#!/usr/bin/env bash
#
# Generates .terraform.lock.hcl file having hashes for each architecture we run on
# https://www.terraform.io/cli/commands/providers/lock

terraform providers lock -platform=darwin_arm64 -platform=darwin_amd64 -platform=linux_amd64 -platform=linux_arm64

May 06 '22 18:05 tomharrisonjr

Sounds like the workaround is to either

do not commit the lock file
if it is committed, lock it for the platform that Atlantis is deployed to and the platforms terraform workflows are run on locally (e.g. local m1 laptops)

Thanks for everyone investigating this and coming up with a solution that works.

It would be nice to create a new doc to mention how to commit this file properly.

Jan 16 '23 13:01 nitrocode

Just chiming in - we're not vendoring/committing the lockfiles and we're still running into this.

Workaround is to delete the plugin cache dir or vendor/commit the lockfile with the platform atlantis is running on (+ any local envs etc)

Mar 23 '23 15:03 cilindrox

is there a regression on this workaround for v0.25.0?

Ref:

https://github.com/runatlantis/atlantis/issues/3201

I tried upgrading (listing all changes to highlight the issue seems related to v0.25.0):

	Atlantis.	Terraform	TF provider AWS
from	`v0.24.4`	`v1.5.4`	`~> v4`	✅
to	`v0.25.0`	`v1.5.7`	`~> v5`	❌
revert	`v0.24.4`	`v1.5.7`	`~> v5`	✅

I am using this in my atlantis.env snippet

# Atlantis issues with TF 1.4+
# https://github.com/runatlantis/atlantis/issues/3201
TF_PLUGIN_CACHE_MAY_BREAK_DEPENDENCY_LOCK_FILE=true
# ...

Note: I run atlantis in a systemd unit on an EC2 instance, no container / no k8s configmaps or secrets and everything works in v0.24.4

I do consider finding out a way to make sure the terraform lock files are committed (we run across windows/linux/mac and amd64/arm64 machines so we're not comiting lock files yet, but if anyone has some type of pre-commit checks that help validate the lock file, I'll make sure the lock files are added to resolve this issue instead.

the only change log entries mentioning lock files for v0.25.0 release seems to be:

https://github.com/runatlantis/atlantis/pull/3473

Sep 19 '23 04:09 vincentgna

I was storing the plugin-cache on an EBS volume and while doing provider upgrades, there would be issues with the versions in there.

So perhaps there's no regression and I just had to rm -rf the plugin-cache and force a new copy running terraform init

Sep 26 '23 02:09 vincentgna

atlantis atlantis copied to clipboard

terraform lock file committed on arm, linux amd deploy, init command throws error with tf 0.14

atlantis
atlantis copied to clipboard