terraform-provider-aws icon indicating copy to clipboard operation
terraform-provider-aws copied to clipboard

[Bug]: Terraform apply command freezes during AWS provider initialization

Open law opened this issue 1 year ago • 63 comments

Terraform Core Version

1.5.7

AWS Provider Version

5.69.0

Affected Resource(s)

n/a

Expected Behavior

'terraform apply' should continue, and ask me for confirmation before applying changes

Actual Behavior

When running `terraform apply, the process freezes during the initialization of the AWS provider. The command does not complete and requires manual termination.

Relevant Error/Panic Output Snippet

2024-09-27T13:46:40.034-0600 [DEBUG] provider.terraform-provider-aws_v5.69.0_x5: assertion failed [arm_interval().contains(address)]: code fragment does not contain the given arm address
2024-09-27T13:46:40.034-0600 [DEBUG] provider.terraform-provider-aws_v5.69.0_x5: (CodeFragmentMetadata.cpp:48 instruction_extents_for_arm_address)

Terraform Configuration Files

https://gist.github.com/law/62b9c75214c18a015c37f16285a13ba4

Steps to Reproduce

  1. Run TF_LOG=debug terraform apply (or terragrunt init) when using the above provider
  2. Observe that the .terraform/providers/registry.terraform.io/hashicorp/aws/5.69.0/darwin_amd64/terraform-provider-aws_v5.69.0_x5 process hangs indefinitely

Debug Output

https://gist.github.com/law/5271d0e0cd052d438a194eb50c11da63

Panic Output

No response

Important Factoids

No response

References

No response

Would you like to implement a fix?

None

law avatar Sep 27 '24 20:09 law

Community Note

Voting for Prioritization

  • Please vote on this issue by adding a 👍 reaction to the original post to help the community and maintainers prioritize this request.
  • Please see our prioritization guide for information on how we prioritize.
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.

Volunteering to Work on This Issue

  • If you are interested in working on this issue, please leave a comment.
  • If this would be your first contribution, please review the contribution guide.

github-actions[bot] avatar Sep 27 '24 20:09 github-actions[bot]

I forgot to add, reverting to provider 5.68.0 works like a champ.

law avatar Sep 27 '24 20:09 law

I have the same issue with v5.69.0. And with v5.68.0 - all Ok.

UPD: Chip - Apple M1 Pro

alexpodr avatar Sep 27 '24 22:09 alexpodr

What we found for our team is that only x86 provider did not work as it should. Mac with intel or arm doesn't matter, if you use 5.69.0_x86 (with Rosseta on arm) there will be some kind of problems.

elaigor avatar Sep 30 '24 08:09 elaigor

I have the same issue with v5.69.0(hashicorp/aws/5.69.0/darwin_amd64/terraform-provider-aws_v5.69.0_x5) and chip Apple M3 with Rosetta 2.

jotasixto avatar Sep 30 '24 14:09 jotasixto

Same issue on Apple M3

JnMik avatar Sep 30 '24 18:09 JnMik

It seems to be an issue only with Apple M chips, the version v5.69.0 works fine on my Linux/amd64.

Also, it works fine when I use the same version in the CDKTF. Quite odd.

LozanoMatheus avatar Sep 30 '24 19:09 LozanoMatheus

I was trying to figure this out for 3 hours today what it was. Was convinced it was a provider and here we are. Thanks for whoever reported this.

cailen avatar Sep 30 '24 21:09 cailen

Looks like these projects have had similar errors recently:

  • SST: https://github.com/sst/sst/issues/4422
  • Pulumi: https://github.com/pulumi/pulumi-aws/issues/4190

cailen avatar Oct 01 '24 11:10 cailen

It looks like the darwin_arm64 version is not being properly downloaded? Screenshot 2024-10-01 at 7 28 07 AM

cailen avatar Oct 01 '24 11:10 cailen

This led me down quite the rabbit hole wondering why on Apple M2 I am only getting amd64. It turns out (probably during one of the many upgrades from 0.11/0.12 darwin_arm64-unsupported versions), I may have set the architecture to amd64 and it downloaded not only the old version but the version we upgrade to (1.5.7) in amd64.

% export TFENV_ARCH=arm64

% tfenv install 1.5.7
Terraform v1.5.7 is already installed

% terraform version      
Terraform v1.5.7
on darwin_amd64
+ provider registry.terraform.io/datadog/datadog v3.44.1
+ provider registry.terraform.io/hashicorp/aws v5.68.0
+ provider registry.terraform.io/hashicorp/http v3.4.5
+ provider registry.terraform.io/hashicorp/null v3.2.3
+ provider registry.terraform.io/hashicorp/random v3.6.3
+ provider registry.terraform.io/hashicorp/time v0.12.1
+ provider registry.terraform.io/mongodb/mongodbatlas v1.20.0

Your version of Terraform is out of date! The latest version
is 1.9.5. You can update by downloading from https://www.terraform.io/downloads.html

% tfenv uninstall 1.5.7
Uninstall Terraform v1.5.7
Terraform v1.5.7 is successfully uninstalled

% terraform version    
version '1.5.7' is not installed (set by /repo/.terraform-version). Installing now as TFENV_AUTO_INSTALL==true
Installing Terraform v1.5.7
Downloading release tarball from https://releases.hashicorp.com/terraform/1.5.7/terraform_1.5.7_darwin_arm64.zip
######################################################################################################################## 100.0%
Downloading SHA hash file from https://releases.hashicorp.com/terraform/1.5.7/terraform_1.5.7_SHA256SUMS
Not instructed to use Local PGP (/opt/homebrew/Cellar/tfenv/3.0.0/use-{gpgv,gnupg}) & No keybase install found, skipping OpenPGP signature verification
Archive:  /var/folders/6w/zd8qsqzn1r7g4b00h3rk79v80000gp/T/tfenv_download.XXXXXX.9YGFyexnCK/terraform_1.5.7_darwin_arm64.zip
  inflating: /opt/homebrew/Cellar/tfenv/3.0.0/versions/1.5.7/terraform  
Installation of terraform v1.5.7 successful. To make this your default version, run 'tfenv use 1.5.7'
Terraform v1.5.7
on darwin_arm64
+ provider registry.terraform.io/hashicorp/aws v5.68.0


Your version of Terraform is out of date! The latest version
is 1.9.5. You can update by downloading from https://www.terraform.io/downloads.html

terraform init -upgrade   

Initializing the backend...
Upgrading modules...
...

Initializing provider plugins...
- Finding hashicorp/aws versions matching ">= 2.23.0, >= 3.35.0, >= 4.0.0, >= 4.10.0, ~> 5.0"...

Terraform has been successfully initialized!

And now a darwin arm64 copy is showing up:

Screenshot 2024-10-01 at 7 43 09 AM

If using the amd64 version, I've always had issues where, like the other PRs I've linked, I've had to set GODEBUG=asyncpreemptoff=1

cailen avatar Oct 01 '24 11:10 cailen

This led me down quite the rabbit hole wondering why on Apple M2 I am only getting amd64. It turns out (probably during one of the many upgrades from 0.11/0.12 darwin_arm64-unsupported versions), I may have set the architecture to amd64 and it downloaded not only the old version but the version we upgrade to (1.5.7) in amd64.

% export TFENV_ARCH=arm64

% tfenv install 1.5.7
Terraform v1.5.7 is already installed

% terraform version      
Terraform v1.5.7
on darwin_amd64
+ provider registry.terraform.io/datadog/datadog v3.44.1
+ provider registry.terraform.io/hashicorp/aws v5.68.0
+ provider registry.terraform.io/hashicorp/http v3.4.5
+ provider registry.terraform.io/hashicorp/null v3.2.3
+ provider registry.terraform.io/hashicorp/random v3.6.3
+ provider registry.terraform.io/hashicorp/time v0.12.1
+ provider registry.terraform.io/mongodb/mongodbatlas v1.20.0

Your version of Terraform is out of date! The latest version
is 1.9.5. You can update by downloading from https://www.terraform.io/downloads.html

% tfenv uninstall 1.5.7
Uninstall Terraform v1.5.7
Terraform v1.5.7 is successfully uninstalled

% terraform version    
version '1.5.7' is not installed (set by /repo/.terraform-version). Installing now as TFENV_AUTO_INSTALL==true
Installing Terraform v1.5.7
Downloading release tarball from https://releases.hashicorp.com/terraform/1.5.7/terraform_1.5.7_darwin_arm64.zip
######################################################################################################################## 100.0%
Downloading SHA hash file from https://releases.hashicorp.com/terraform/1.5.7/terraform_1.5.7_SHA256SUMS
Not instructed to use Local PGP (/opt/homebrew/Cellar/tfenv/3.0.0/use-{gpgv,gnupg}) & No keybase install found, skipping OpenPGP signature verification
Archive:  /var/folders/6w/zd8qsqzn1r7g4b00h3rk79v80000gp/T/tfenv_download.XXXXXX.9YGFyexnCK/terraform_1.5.7_darwin_arm64.zip
  inflating: /opt/homebrew/Cellar/tfenv/3.0.0/versions/1.5.7/terraform  
Installation of terraform v1.5.7 successful. To make this your default version, run 'tfenv use 1.5.7'
Terraform v1.5.7
on darwin_arm64
+ provider registry.terraform.io/hashicorp/aws v5.68.0


Your version of Terraform is out of date! The latest version
is 1.9.5. You can update by downloading from https://www.terraform.io/downloads.html

terraform init -upgrade   

Initializing the backend...
Upgrading modules...
...

Initializing provider plugins...
- Finding hashicorp/aws versions matching ">= 2.23.0, >= 3.35.0, >= 4.0.0, >= 4.10.0, ~> 5.0"...

Terraform has been successfully initialized!

And now a darwin arm64 copy is showing up:

Screenshot 2024-10-01 at 7 43 09 AM If using the amd64 version, I've always had issues where, like the other PRs I've linked, I've had to set `GODEBUG=asyncpreemptoff=1`

I’m using an Apple M3 with tfenv configured to the amd64 architecture in my TFENV_ARCH environment variable. This is because, in the projects I work on with my team, not everyone uses Apple M3; most of them are on Linux. To stay aligned with the rest of the team, I always configure my Terraform binary to work in amd64. Therefore, in my case, I’m on Apple M3 and using Rosetta 2. This issue only started happening with the latest version of the AWS provider (5.69.0). With version 5.68.0, everything works perfectly, only get the error that @law mentioned with version 5.69.0.

@cailen I tried running the command with the environment variable you suggested (GODEBUG=asyncpreemptoff=1) along with TF_LOG="debug", as shown below:

TF_LOG="debug" GODEBUG=asyncpreemptoff=1 terraform plan

While this reduced the number of log lines, the issue persists and fails at the same elements. Below is an excerpt from the logs:

...
2024-10-01T14:33:27.861+0200 [DEBUG] created provider logger: level=debug
2024-10-01T14:33:27.861+0200 [INFO]  provider: configuring client automatic mTLS
2024-10-01T14:33:27.869+0200 [DEBUG] provider: starting plugin: path=.terraform/providers/registry.terraform.io/hashicorp/aws/5.69.0/darwin_amd64/terraform-provider-aws_v5.69.0_x5 args=[.terraform/providers/registry.terraform.io/hashicorp/aws/5.69.0/darwin_amd64/terraform-provider-aws_v5.69.0_x5]
2024-10-01T14:33:27.876+0200 [DEBUG] provider: plugin started: path=.terraform/providers/registry.terraform.io/hashicorp/aws/5.69.0/darwin_amd64/terraform-provider-aws_v5.69.0_x5 pid=51108
2024-10-01T14:33:27.876+0200 [DEBUG] provider: waiting for RPC address: path=.terraform/providers/registry.terraform.io/hashicorp/aws/5.69.0/darwin_amd64/terraform-provider-aws_v5.69.0_x5
2024-10-01T14:34:27.881+0200 [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/hashicorp/aws/5.69.0/darwin_amd64/terraform-provider-aws_v5.69.0_x5 pid=51108 error="signal: killed"
...
2024-10-01T14:34:33.100+0200 [DEBUG] created provider logger: level=debug
2024-10-01T14:34:33.100+0200 [INFO]  provider: configuring client automatic mTLS
2024-10-01T14:34:33.108+0200 [DEBUG] provider: starting plugin: path=.terraform/providers/registry.terraform.io/hashicorp/template/2.2.0/darwin_amd64/terraform-provider-template_v2.2.0_x4 args=[.terraform/providers/registry.terraform.io/hashicorp/template/2.2.0/darwin_amd64/terraform-provider-template_v2.2.0_x4]
2024-10-01T14:34:33.113+0200 [DEBUG] provider: plugin started: path=.terraform/providers/registry.terraform.io/hashicorp/template/2.2.0/darwin_amd64/terraform-provider-template_v2.2.0_x4 pid=52965
2024-10-01T14:34:33.113+0200 [DEBUG] provider: waiting for RPC address: path=.terraform/providers/registry.terraform.io/hashicorp/template/2.2.0/darwin_amd64/terraform-provider-template_v2.2.0_x4
2024-10-01T14:34:34.598+0200 [INFO]  provider.terraform-provider-template_v2.2.0_x4: configuring server automatic mTLS: timestamp=2024-10-01T14:34:34.598+0200
2024-10-01T14:34:34.638+0200 [DEBUG] provider.terraform-provider-template_v2.2.0_x4: plugin address: address=/var/folders/s4/sp8pl52s6ynbzwm57pq5gmkh0000gn/T/plugin458334256 network=unix timestamp=2024-10-01T14:34:34.638+0200
2024-10-01T14:34:34.638+0200 [DEBUG] provider: using plugin: version=5
2024-10-01T14:34:34.692+0200 [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unimplemented desc = unknown service plugin.GRPCStdio"
2024-10-01T14:34:34.692+0200 [DEBUG] No provider meta schema returned
2024-10-01T14:34:34.696+0200 [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/hashicorp/template/2.2.0/darwin_amd64/terraform-provider-template_v2.2.0_x4 pid=52965
2024-10-01T14:34:34.696+0200 [DEBUG] provider: plugin exited
...

jotasixto avatar Oct 01 '24 12:10 jotasixto

@jotasixto The binaries are all made with the same code. I would think you could use the native copies (darwin arm64) to run anything locally on your computer and the other users could use darwin amd64 or whatever other flavor without issue. We do this where I work. Some are still on older Intel Macs. We also run things via Github Actions using Linux Amd64. I'm not at all discounting that it is broken, but you may be better off using the native version for your system unless there is no compatible one (like for very old versions of Terraform).

cailen avatar Oct 01 '24 13:10 cailen

@jotasixto The binaries are all made with the same code. I would think you could use the native copies (darwin arm64) to run anything locally on your computer and the other users could use darwin amd64 or whatever other flavor without issue. We do this where I work. Some are still on older Intel Macs. We also run things via Github Actions using Linux Amd64. I'm not at all discounting that it is broken, but you may be better off using the native version for your system unless there is no compatible one (like for very old versions of Terraform).

@cailen Unfortunately, some of our legacy projects (which we are currently working on updating) use version 3 of the AWS provider, which doesn't have a compiled version for ARM to download. Therefore, I am unable to work with them locally on my machine. This is why I also have the Terraform binary configured to use darwin_amd64

jotasixto avatar Oct 01 '24 13:10 jotasixto

@jotasixto makes sense then! Are you sure they don't have ARM copies though? From 3.30.0 there are ARM versions for Darwin. https://releases.hashicorp.com/terraform-provider-aws/3.30.0/. Maybe you are stuck using a version less than 3.30.0, but it may be worth trying to upgrade to the latest 3.x version if you can. The ARM version of Terraform also runs a lot faster.

cailen avatar Oct 01 '24 13:10 cailen

@jotasixto makes sense then! Are you sure they don't have ARM copies though? From 3.30.0 there are ARM versions for Darwin. https://releases.hashicorp.com/terraform-provider-aws/3.30.0/. Maybe you are stuck using a version less than 3.30.0, but it may be worth trying to upgrade to the latest 3.x version if you can. The ARM version of Terraform also runs a lot faster.

@cailen I apologize for the confusion earlier. I was replying from my phone at the time and recalling from memory, as it had been a few months since I last worked on this issue. Now that I’ve had the chance to check it again on my laptop, I can confirm that the problem was actually related to HashiCorp providers and not AWS providers.

╷
│ Error: Incompatible provider version
│
│ Provider registry.terraform.io/hashicorp/template v2.2.0 does not have a package available for your current platform, darwin_arm64.
│
│ Provider releases are separate from Terraform CLI releases, so not all providers are available for all platforms. Other versions of this provider may have different platforms supported.
╵

╷
│ Error: Incompatible provider version
│
│ Provider registry.terraform.io/hashicorp/local v1.4.0 does not have a package available for your current platform, darwin_arm64.
│
│ Provider releases are separate from Terraform CLI releases, so not all providers are available for all platforms. Other versions of this provider may have different platforms supported.
╵

╷
│ Error: Incompatible provider version
│
│ Provider registry.terraform.io/hashicorp/null v2.1.2 does not have a package available for your current platform, darwin_arm64.
│
│ Provider releases are separate from Terraform CLI releases, so not all providers are available for all platforms. Other versions of this provider may have different platforms supported.
╵

I should have verified this before responding.

However, this issue is causing a similar kind of blockage in legacy projects, as I can't remove the providers without following the proper migration process. Additionally, using the darwin_arm64 binary in my case is not a viable solution to work on the migration of these projects.

Thank you for your understanding, and I appreciate your suggestion!

jotasixto avatar Oct 01 '24 14:10 jotasixto

Since Terraform Cloud runners are all x64, I lock my arch to that anyway on my M macs... helps with the lock file hashes as well. Maybe this is overkill but until this issue it's been working well for me.

claytonolley avatar Oct 01 '24 16:10 claytonolley

Looks like these projects have had similar errors recently:

If I'm not mistaken, in the past Pulumi use terraform providers in the end, not sure if they still do it for some cases like this. If so, could be it. I'm not sure about stt 🤔

It seems to be an issue only with Apple M chips, the version v5.69.0 works fine on my Linux/amd64.

Also, it works fine when I use the same version in the CDKTF. Quite odd.

And about this, I just double checked and that CDKTF was using v5.59.0, I guess I was a bit too sleepy at that time 😅 . I got the same issue with the version v5.69.0.

LozanoMatheus avatar Oct 01 '24 17:10 LozanoMatheus

I just unplugged my laptop for the day and noticed how hot my macbook is. Turns out these bad AWS providers from this morning were still running! Screenshot 2024-10-01 at 8 15 36 PM

cailen avatar Oct 02 '24 00:10 cailen

I have the same problem, I think, with the provider version v5.70.0 and earlier, I use a MAC IOS 15.0.1 - Chip Apple M2 - terraform version 0.14.11 and 1.9.7 and there is no way to run terraform without problems: image image image image image and with this error all the time, and if it does not give this error the process hangs, increasing the consumption of cpu and memory without success, and it is necessary to proceed to kill the process with what entails the blockade in dynamoDB and it is impossible to deploy infra in AWS and it is fundamental for my work.

mmadrono avatar Oct 04 '24 11:10 mmadrono

Same issue Macbook Pro M1 Terraform v1.5.7 Provider v5.70

Downgrade to v5.68 fixed the issue. Also I want to emphasize on a comment above on how hard the problematic v5.70 providers hit your CPU and memory

rmayore avatar Oct 06 '24 11:10 rmayore

I find it crazy that this has not been triaged yet. What is going on, maintainers? @marcosentino @breathingdust @justinretzolk

cailen avatar Oct 07 '24 12:10 cailen

Just chiming in: back-pinning to v5.68 also seems to have worked for me.

Macbook Pro M3 Terraform v1.8.5 Provider v5.70

And the 5.70 provider binary was indeed running in the background at 90+ usage.

mars64 avatar Oct 07 '24 20:10 mars64

Could anyone who is experiencing this with v5.69.0 or v5.70.0 see if they have the same problem with v5.65.0 or v5.66.0? These are all versions compiled with Go 1.23 whereas v5.67.0 and v5.68.0 were compiled with Go 1.22. Thanks.

ewbankkit avatar Oct 07 '24 20:10 ewbankkit

Could anyone who is experiencing this with v5.69.0 or v5.70.0 see if they have the same problem with v5.65.0 or v5.66.0? These are all versions compiled with Go 1.23 whereas v5.67.0 and v5.68.0 were compiled with Go 1.22. Thanks.

In my case (M3, Terraform 1.8.5), provider v5.66 and v5.65 both appear to work as expected -- I can successfully initialize the provider during plan and apply, and the binary appears to exit as expected.

mars64 avatar Oct 07 '24 21:10 mars64

I got the same as mars64 (M3, Terraform 1.9.7, 1.8.5, 17.5) using both providers and it seems all good, but there is one thing I found interesting.

When I tried to run multiple projects in parallel (4x), I did get a similar behaviour as the ones from versions v5.69.0 and v5.70.0. It froze for a little while (~1min) when running init and plan, sometimes it failed as well. But I didn't get any running process once the commands were finished.

My guess is there is some sort of a lock that prevents TF from spawning (too many - ~5 maybe) new processes, so since the versions v5.69.0 and v5.70.0 keep the process running after the command finishes, TF can't run again and cause the freeze. I also saw that the zombie processes took a while to finish when running kill <PID>.

LozanoMatheus avatar Oct 07 '24 21:10 LozanoMatheus

Also impacted me. Running on M1 MacBook. AWS provider 5.69 and 5.70 would never work and were taking 80%+ CPU constantly. Dropping back to 5.68 worked, but another workaround worked too - use the amd64 provider (5.70) and set GODEBUG=asyncpreemptoff=1.

Seems that 5.69+ is currently not good on Apple Silicon.

chmurray-cisco avatar Oct 08 '24 12:10 chmurray-cisco

Same here. M3, terraform 1.7.3, AWS providers 5.69.0 and 5.70.0 both hang during plan and apply (noticed multiple terraform-provider-aws_v5.70.0_x5 processes keep running) as described above. Downgrading to 5.68.0 works well for me. I have the same outcome as @mars64 described with 5.65.0 and 5.66.0. Using the arm version of terraform is no option for me as well due to the same reasons @jotasixto mentioned, being dependent on other providers which do not support darwin_arm64.

lirlirlirlir avatar Oct 08 '24 13:10 lirlirlirlir

Same here on M1. Terraform 1.9.2, though it doesn't seem related to Terraform version. Both 5.69.0 and 5.70.0 version of the AWS provider will intermittently result in a hang during the plan. Sometimes it returns a timeout, sometimes the entire shell will hang, and sometimes the plan will successfully run. Pinning 5.68.0 or lower does resolve the issue, but there is specific functionality I am looking to use in version 5.69.0+.

StvnWthrsp avatar Oct 08 '24 13:10 StvnWthrsp

Also a confirmation on M2 machine with TF 1.9.5: using >=5.69.0 results in timeout. Downgrade to 5.68.0 "resolves" the issue.

At the same time on a different M1 machine with TF 1.9.7 and 5.70.0 works just as expected.

dim13 avatar Oct 08 '24 14:10 dim13