terragrunt icon indicating copy to clipboard operation
terragrunt copied to clipboard

Misleading error log on 429 errors from registry

Open arnoldyahad opened this issue 9 months ago • 9 comments

Hey,

Recently we have been throttled by hashicorp registry (As we confirmed by talking with their support) But it was impossible to understand from our error logs (even when enabling debug)

we are running terragrunt plan commands on high scale and this is the error log that we got:

13:14:42.342 INFO Terragrunt Cache server is listening on 127.0.0.1:38691
14513:14:42.342 INFO Start Terragrunt Cache server
14613:14:42.752 INFO Downloading Terraform configurations from git::ssh://[[email protected]](mailto:[email protected])/<organization>/<repo>.git?ref=lambda_v1.1.37 into ./.terragrunt-cache/uCvmXUGu5jL2E_fX-K1egZAxmTs/bkwVI7uwl97tkFlbhBGmEdngoJk
14713:14:43.657 INFO Caching terraform providers for ./.terragrunt-cache/uCvmXUGu5jL2E_fX-K1egZAxmTs/bkwVI7uwl97tkFlbhBGmEdngoJk/lambda
14813:14:44.355 ERROR terraform invocation failed in ./.terragrunt-cache/uCvmXUGu5jL2E_fX-K1egZAxmTs/bkwVI7uwl97tkFlbhBGmEdngoJk/lambda
14913:14:44.355 INFO Shutting down Terragrunt Cache server...
15013:14:44.355 INFO Terragrunt Cache server stopped
15113:14:44.355 ERROR 3 errors occurred:
152
153* unable to cache provider: registry.terraform.io/hashicorp/external v2.3.4, err: not found provider download url
154
155* unable to cache provider: registry.terraform.io/hashicorp/local v2.5.2, err: not found provider download url
156
157* unable to cache provider: registry.terraform.io/hashicorp/null v3.2.3, err: not found provider download url

the error here has not information about rate-limiting or the status code of 429, and makes it much harder to recognize and troubleshoot

we saw this error message due to this code: https://github.com/gruntwork-io/terragrunt/blob/main/tf/cache/services/provider_cache.go#L218-L236

it doesnt say anything about being 429 and without hashicorp support we wouldn't understand this.

Steps To Reproduce

Get 429 from hashicorp (as their support states - something around 3000 requests in 5 minutes is enough to get 429 error)

also, have terragrunt use their caching mechanism:

export TERRAGRUNT_PROVIDER_CACHE=1

0.69.9 and 0.71.1 - its the same error message.

Expected behavior

We want a clear message that says - 429 Rate-limiting just like it would say if we would use terraform.

Versions

  • Terragrunt version: 0.69.9 / 0.71.1 but you can also use latest as i see code haven't changed.
  • OpenTofu/Terraform version: 1.10.1
  • Environment details (Ubuntu 20.04, Windows 10, etc.): Ubuntu 20.04

arnoldyahad avatar Feb 25 '25 09:02 arnoldyahad

I definitely agree that the error message could be better.

Requesting community contributions, as you've already done some root cause analysis, and it shouldn't be too hard for someone from the community to submit a fix.

yhakbar avatar Feb 27 '25 21:02 yhakbar

If I switch to not using provider cache, I can successfully download the provider. How am I getting throttled, then?

grimm26 avatar Apr 24 '25 20:04 grimm26

Hi, I'm encountering the same error since upgrading from terragrunt version 0.77.20 to version 0.77.22. I'm running terragrunt in a GitHub Action, and the provider cache is enabled. By disabling the provider cache, the execution time moves from ~15 minutes to more than one hour, which is not a viable option for my use case. This is the log:

$ terragrunt run-all plan --non-interactive --provider-cache --working-dir / --queue-exclude-dir /package-excluded --parallelism 16.
09:55:12.280 INFO   Terragrunt Cache server is listening on 127.0.0.1:34695
09:55:12.281 INFO   Start Terragrunt Cache server
09:55:12.281 WARN   [package-1] Using `terragrunt.hcl` as the root of Terragrunt configurations is an anti-pattern, and no longer recommended. In a future version of Terragrunt, this will result in an error. You are advised to use a differently named file like `root.hcl` instead. For more information, see https://terragrunt.gruntwork.io/docs/migrate/migrating-from-root-terragrunt-hcl
09:55:12.327 INFO   The stack at . will be processed in the following order for command plan:
Group 1
- Module ./package-1
- Module ./package-2
- Module ./package-3
09:55:12.483 INFO   [package-3] Downloading Terraform configurations from ../../../infrastructure/aws/modules/package-3 into ./package-3/.terragrunt-cache/x0y2mDHkVl_YWrxR4X4DOxEi2W8/0Crg5bhSe0-tVGKO1jqBTKWtLz8
09:55:12.705 INFO   [package-3] Caching terraform providers for ./package-3/.terragrunt-cache/x0y2mDHkVl_YWrxR4X4DOxEi2W8/0Crg5bhSe0-tVGKO1jqBTKWtLz8
09:55:12.713 INFO   [package-2] Downloading Terraform configurations from ../../../infrastructure/aws/modules/package-2 into ./package-2/.terragrunt-cache/mDwY25I5tlSFoLuoEhRm5IPj-CU/wqjnzHi4nj1rYZe7PfusmikcUA0
09:55:12.738 INFO   [package-1] Downloading Terraform configurations from ../../../infrastructure/aws/modules/empty into ./package-1/.terragrunt-cache/zXezC686oBq6SvexVuBiO7c9OvA/zYUClprxT3148oUC4lAKDoP8O1w
09:55:12.761 INFO   [package-2] Caching terraform providers for ./package-2/.terragrunt-cache/mDwY25I5tlSFoLuoEhRm5IPj-CU/wqjnzHi4nj1rYZe7PfusmikcUA0
09:55:12.788 INFO   [package-1] Caching terraform providers for ./package-1/.terragrunt-cache/zXezC686oBq6SvexVuBiO7c9OvA/zYUClprxT3148oUC4lAKDoP8O1w
09:55:13.070 ERROR  [package-3] terraform invocation failed in ./package-3/.terragrunt-cache/x0y2mDHkVl_YWrxR4X4DOxEi2W8/0Crg5bhSe0-tVGKO1jqBTKWtLz8
09:55:13.070 ERROR  [package-3] Module ./package-3 has finished with an error
09:55:13.110 ERROR  [package-2] terraform invocation failed in ./package-2/.terragrunt-cache/mDwY25I5tlSFoLuoEhRm5IPj-CU/wqjnzHi4nj1rYZe7PfusmikcUA0
09:55:13.110 ERROR  [package-2] Module ./package-2 has finished with an error
09:55:13.118 ERROR  [package-1] terraform invocation failed in ./package-1/.terragrunt-cache/zXezC686oBq6SvexVuBiO7c9OvA/zYUClprxT3148oUC4lAKDoP8O1w
09:55:13.118 ERROR  [package-1] Module ./package-1 has finished with an error
09:55:13.119 INFO   Shutting down Terragrunt Cache server...
09:55:13.119 INFO   Terragrunt Cache server stopped
09:55:13.119 ERROR  3 errors occurred:
* unable to cache provider: registry.terraform.io/hashicorp/aws v5.96.0, err: not found provider download url
* unable to cache provider: registry.terraform.io/hashicorp/aws v5.96.0, err: not found provider download url
* unable to cache provider: registry.terraform.io/hashicorp/aws v5.96.0, err: not found provider download url
09:55:13.119 ERROR  Unable to determine underlying exit code, so Terragrunt will exit with error code 1

Thanks for the help!

UPDATE Switching from version 0.77.20 to 0.77.22 doesn't have any impact on the issue, but after rolling back the AWS Terraform provider to version 5.95.0, everything worked as expected

Giaco9NN avatar Apr 28 '25 10:04 Giaco9NN

we eventually spoke with hashicorp and they indeed confirmed we were throttled on several occasions.

the problem is not with either some provider throttles or not, the problem is that there is no visibility from the log even on debug level that the status code is 429 and its misleading - because err: not found provider download url is not the actual issue.

arnoldyahad avatar Apr 28 '25 11:04 arnoldyahad

my issue turned out to be the cloudfront cache issue with the aws provider v5.96.0 :(. Bottom line is we need better logging in the provider cache server.

grimm26 avatar Apr 28 '25 14:04 grimm26

my issue turned out to be the cloudfront cache issue with the aws provider v5.96.0 :(. Bottom line is we need better logging in the provider cache server.

We had the same issue. Now I can confirm I'm able to download the newer version of the provider even with the provider cache enabled

Giaco9NN avatar Apr 30 '25 09:04 Giaco9NN

Hi! I'm having the same error with the hashicorp/terraform-provider-archive provider. I understand this is not strictly related to Terragrunt, but I don't understand what happens when disabling the terragrunt provider cache. Basically, when I use the option --provider-cache I see the error registry.terraform.io/hashicorp/archive v2.7.1, err: not found provider download url. If I don't use that option, everything works as expected, slower, but it works. May I ask for a clarification? Thanks!

Giaco9NN avatar May 20 '25 04:05 Giaco9NN

Any update on this issue?, as it's not consistently happening, i'm using terragrunt run --all apply --provider-cache sometimes it passes and sometimes it fails

Terragrunt version: v0.78.4 Terraform version: 1.12.0

is there any workarounds to cache providers across mutliple modules (e.g. registry.terraform.io/hashicorp/aws v5.98.0) other than the Terragrunt Provider Cache Server?

Thanks!

moustafaatef74 avatar May 21 '25 11:05 moustafaatef74

Hey folks,

It looks like this is an issue related to the Terraform provider registry, not anything that the Terragrunt or Terraform binaries are doing. It seems to be some misconfiguration on their CDN caching strategy, and as such, we won't be able to give any updates on this issue.

If someone has evidence to the contrary, we can look into it.

EDIT: For what it's worth, nobody using OpenTofu is reporting this issue, so it might be worth exploring using the OpenTofu registry instead.

yhakbar avatar May 21 '25 12:05 yhakbar

Thank you for this issue! I've been encountering this with the kubernetes provider v2.38.0 which was released 2 days ago.

The speculation on the CDN misconfiguration for the release nudged my towards pinning to v2.37.1 and that resolved my error messages. Thanks @yhakbar for the comment that resolved my issue.

Terragrunt version: v0.76.6 Terraform version: v.1.12.2

scottmuc-mo avatar Jul 23 '25 11:07 scottmuc-mo