terraform Invalid count argument does not say which module instantiation an error occurred in.

Terraform Version

Terraform v1.3.6
on linux_amd64
+ provider registry.terraform.io/hashicorp/archive v2.2.0
+ provider registry.terraform.io/hashicorp/aws v4.47.0
+ provider registry.terraform.io/hashicorp/random v3.4.3
+ provider registry.terraform.io/hashicorp/template v2.2.0

Terraform Configuration Files

resource "aws_iam_role_policy" "permissions_role_policy" {
  count  = length(var.permissions) > 0 && var.instance_profile_arn == null ? 1 : 0
  name   = "permissions"
  role   = aws_iam_role.role[0].name
  policy = data.aws_iam_policy_document.permissions.json

  lifecycle {
    ignore_changes = [name]
  }
}

See steps to reproduce

Debug Output

Nothing that looks relevant

Expected Behavior

The error message should indicate which instance of a module is causing the error, so that it is easier to resolve the issue.

Actual Behavior

I get an error like:

╷
│ Error: Invalid count argument
│ 
│   on ../modules/autoscaling_group/main.tf line 139, in resource "aws_iam_role_policy" "permissions_role_policy":
│  139:   count  = length(var.permissions) > 0 && var.instance_profile_arn == null ? 1 : 0
│ 
│ The "count" value depends on resource attributes that cannot be determined until apply, so Terraform cannot predict how many instances will be created. To work around this, use the -target argument to first
│ apply only the resources that the count depends on.
╵

Note that it doesn't say anything about which instance of the autoscaling_group module caused this error. and there are several instances of this module in the root module, so determining which one caused this error is difficult.

Steps to Reproduce

create a module where a resource is conditionally created based on a variable, such as in the example configuration above
Create multiple instances of the module in a root module
Have one of the module instances use a variable that uses an interpolated value based on a resource that doesn't exist yet, so that the value used in the expression for the count is unkown.
run terraform plan
See the error

Additional Context

This wouldn't be as much of an issue if terraform could be intelligent about converting unknown values to known values where possible.

For example, in this case even though var.permissions contains values that use interpolated values from unknown values, the length of var.permissions could be known at plan time, because the number of elements passed in is static, even if some of the individual elements don't have known values yet.

However, I recognize that is a more difficult problem to solve, but having better diagnostics would make troubleshooting this kind of issue easier.

References

https://github.com/hashicorp/terraform/issues/30937

Dec 21 '22 18:12 tmccombs

Hi @tmccombs,

I think it should be possible in principle to annotate this diagnostic with an extra address so that it would appear with the extra with clause that we use to describe the dynamic address of what's failing:

╷
│ Error: Invalid count argument
│ 
│   with module.something[0].aws_iam_role_policy.permissions_role_policy,
│   on ../modules/autoscaling_group/main.tf line 139, in resource "aws_iam_role_policy" "permissions_role_policy":
│  139:   count  = length(var.permissions) > 0 && var.instance_profile_arn == null ? 1 : 0
│ 
│ The "count" value depends on resource attributes that cannot be determined until apply, so Terraform cannot predict how many instances will be created. To work around this, use the -target argument to first
│ apply only the resources that the count depends on.
╵

So far we've only been doing that for resource instance errors returned by providers because they happened to be returned through a codepath that has to transform and annotate the diagnostics anyway. I don't think we currently have a mechanism for annotating non-provider-generated diagnostics in the same way, but I agree that it would be helpful to find a way to make that work. One minor quirk here is that since we've not yet calculated the set of instances for this resource this address will need to be a resource address rather than a resource instance address as it would be in other contexts, but I don't think that should be a major problem.

With regard to why this error appeared in the first place, my first guess would've been that var.instance_profile_arn is the one that's unknown here, rather than var.permissions. The hashicorp/aws provider typically lets the server determine the ARN for an object, rather than generating it client-side in the provider, but that does have the unfortunate consequence that they tend to be unknown during planning even for objects whose ARN syntax can be mechanically derived from information already known, as I believe is the case for EC2 instance profile arns.

However, that does raise a second issue with this diagnostic: it doesn't include the usual annotations about what types/values the symbols in the expression have. Assuming that my assumption above is right about it being the ARN that was unknown here, I'd expect the full diagnostic to also mention that:

╷
│ Error: Invalid count argument
│ 
│   with module.something[0].aws_iam_role_policy.permissions_role_policy,
│   on ../modules/autoscaling_group/main.tf line 139, in resource "aws_iam_role_policy" "permissions_role_policy":
│  139:   count  = length(var.permissions) > 0 && var.instance_profile_arn == null ? 1 : 0
│    ├───────────
│    │ var.instance_profile_arn is a string, known only after apply
│    │ var.permissions is a list of string with 2 elements
│ 
│ The "count" value depends on resource attributes that cannot be determined until apply, so Terraform cannot predict how many instances will be created. To work around this, use the -target argument to first
│ apply only the resources that the count depends on.
╵

I think we should use this issue also to represent including the expression context information in the diagnostic so that the UI can render the value hints to be even more specific about which value was the problem, since I expect that the solution to both will be in the same part of the codebase.

Dec 21 '22 18:12 apparentlymart

Similar issue here. What lead terraform to be unable to resolve var.* references?! These are the ones passed in.. it's not something that awaits resolution?! Or am I missing something...

upd:

This count fails in case it's used as a module.

resource "aws_iam_policy" "default" {
  count = module.this.enabled && var.policy_json != "" ? 1 : 0

  name_prefix = module.this.id
  policy      = var.policy_json

  tags = module.this.tags
}

upd2: Tracked down this to... policy_json variable was passed to the module with value generated by templatefile(), ie:

module "some_module" {
  source = "git::[email protected]:some/module.git?ref=tags/v1.0.1"

  ...
  policy_json = templatefile("${path.module}/resources/policy.tpl.json", {
    key1 = value1,
    key2 = value2,
    key3 =  aws_iam_role.default.arn
  }
  ...
  depends_on = [aws_iam_role.default]
}

For some reason terraform refuses to resolve aws_iam_role.default.arn before executing module some_module.

p.s. Terraform 1.3.6

Mar 08 '23 14:03 imunhatep