terraform icon indicating copy to clipboard operation
terraform copied to clipboard

`enabled` parameter to avoid logical statements in `count`

Open romachalm opened this issue 6 years ago • 149 comments

Terraform v0.12.2

Use-cases

We often use count to disable a resource (above all in modules) from a boolean var, with statements like count = var.enabled ? 1 : 0 or count = var.enabled ? 1 : length([some list of resources or datasources])

Here are among others some code snippets from terraform-aws-vpc module :

resource "aws_subnet" "private" {
  count = var.create_vpc && length(var.private_subnets) > 0 ? length(var.private_subnets) : 0

  vpc_id            = local.vpc_id
  cidr_block        = var.private_subnets[count.index]
  availability_zone = element(var.azs, count.index)

  tags = merge(
    {
      "Name" = format(
        "%s-${var.private_subnet_suffix}-%s",
        var.name,
        element(var.azs, count.index),
      )
    },
    var.tags,
    var.private_subnet_tags,
  )
}

resource "aws_network_acl" "public" {
  count = var.create_vpc && var.public_dedicated_network_acl && length(var.public_subnets) > 0 ? 1 : 0

  vpc_id     = element(concat(aws_vpc.this.*.id, [""]), 0)
  subnet_ids = aws_subnet.public.*.id

  tags = merge(
    {
      "Name" = format("%s-${var.public_subnet_suffix}", var.name)
    },
    var.tags,
    var.public_acl_tags,
  )
}

resource "aws_network_acl_rule" "public_inbound" {
  count = var.create_vpc && var.public_dedicated_network_acl && length(var.public_subnets) > 0 ? length(var.public_inbound_acl_rules) : 0

  network_acl_id = aws_network_acl.public[0].id

  egress      = false
  rule_number = var.public_inbound_acl_rules[count.index]["rule_number"]
  rule_action = var.public_inbound_acl_rules[count.index]["rule_action"]
  from_port   = var.public_inbound_acl_rules[count.index]["from_port"]
  to_port     = var.public_inbound_acl_rules[count.index]["to_port"]
  protocol    = var.public_inbound_acl_rules[count.index]["protocol"]
  cidr_block  = var.public_inbound_acl_rules[count.index]["cidr_block"]
}

I have also witnessed some int variable used as count = var.enabled * length([some list of resources or datasources]).

It makes the triggers more complex to understand, as count actually embeds both a boolean expression to enable the provisionning, together with the number of instances of the resource. It sounds like a twist of the initial expected count usage.

Attempted Solutions

I have thought initially about having count accepting booleans by automatically converting true->1 and false->0. I had a quick discussion with go-cty developer that explained to me the intentional decision not to convert bool to number to avoid confusions, and indeed, I am aligned that the code should stay readable by segregating the scopes. Therefore, I attempt this proposal below.

Proposal

Introducing a new parameter enabled in resources accepting bool values or logical statements to provision the resource (default) or not, independently from count value (which can be 0 by the way). It will increase the readibility of the code by avoiding sometines complex conditional operators ?:

The previous examples would become :

resource "aws_subnet" "private" {
  enabled = var.create_vpc
  count   = length(var.private_subnets)

  vpc_id            = local.vpc_id
  cidr_block        = var.private_subnets[count.index]
  availability_zone = element(var.azs, count.index)

  tags = merge(
    {
      "Name" = format(
        "%s-${var.private_subnet_suffix}-%s",
        var.name,
        element(var.azs, count.index),
      )
    },
    var.tags,
    var.private_subnet_tags,
  )
}

resource "aws_network_acl" "public" {
  enabled = var.create_vpc && var.public_dedicated_network_acl && length(var.public_subnets) > 0

  vpc_id     = element(concat(aws_vpc.this.*.id, [""]), 0)
  subnet_ids = aws_subnet.public.*.id

  tags = merge(
    {
      "Name" = format("%s-${var.public_subnet_suffix}", var.name)
    },
    var.tags,
    var.public_acl_tags,
  )
}

resource "aws_network_acl_rule" "public_inbound" {
  enabled = var.create_vpc && var.public_dedicated_network_acl && length(var.public_subnets) > 0 
  count   = length(var.public_inbound_acl_rules)

  network_acl_id = aws_network_acl.public[0].id

  egress      = false
  rule_number = var.public_inbound_acl_rules[count.index]["rule_number"]
  rule_action = var.public_inbound_acl_rules[count.index]["rule_action"]
  from_port   = var.public_inbound_acl_rules[count.index]["from_port"]
  to_port     = var.public_inbound_acl_rules[count.index]["to_port"]
  protocol    = var.public_inbound_acl_rules[count.index]["protocol"]
  cidr_block  = var.public_inbound_acl_rules[count.index]["cidr_block"]
}

Another use case I often see is to to provision some resources only for prod configurations (audit logs buckets, special routes or firewall rules):

enabled = var.env == "production" || var.env == "qa"

enabled shall preced the interpretation of count interpretation.

IMO, it will make the code more readable by explicitly describing what may prevent the provisionning of count resources. It will also allow to clearly distinguish bool from int variables.

References

I haven't found any ticket on that subject. Sorry if I have missed one.

romachalm avatar Jul 02 '19 08:07 romachalm

This is in my opinion such a huge problem with what appears a "simple" solution. Relying on count to disable a module has a lot of bad effects from how you retrieve the output to having warnings about unsupported "count" from submodules that define providers themselves. What is the technical/functional obstacle of having a simple boolean switch on modules and/or resources?

One can neatly separate module calls by using terraform workspace name and much more, avoiding tons of nasty hacks that are filling up the entire web when searching for "terraform re-usable modules" or any similar dry concepts.

PCatinean avatar Sep 25 '20 20:09 PCatinean

@PCatinean The argument was that there are no clear use-cases. I fully support yours and @romachalm and here's mine:

I deploy Landing Zone for 200 Products from my landing-zone module, which actually consists of different modules such as network, key-vault, rbac, des and so on. In some cases, deployment of a feature requires other inputs. e.g. to deploy disk encryption set des, I need first the Security team to deploy the HSM key into a vault that I deploy via key-vault of the landing-zone in the first place. Yes, I can deploy the des module extra outside the landing-zone module, but this means to add 200 instances of des module call with many variables passed into. This is compared to a deploy_des variable flag.

However, when using count, the id like .[0] is added to the resource path in Terraform graph, leading to potential issues in case there sometimes appeared a .[1]. Moreover, turning a "hard-deploy" module into "counted" module means you need to at least terraform state mv all resources so that they contain the .[0] even though you never intent to deploy more than 1 instance of this resource. This may be costly and risky operation, and it may discourage teams from even trying to make their code more flexible.

What do you think of this use-case?

Tbohunek avatar Feb 26 '21 15:02 Tbohunek

Adding additional context @schollii provided in #27936


[As recommended by @apparentlymart in https://github.com/hashicorp/hcl/issues/450, I'm moving this improvement request here.]

The notion of "nullness" or "existence" must be first class in a language, and the language should minimize how much "how" the user needs to express.

Currently in order to make resources and modules optional, we have to resort to using count = 0 or 1. Examples:

resource "provider_type" "name" {
  count = var.name_enabled ? 1 : 0
  ...
} 

The problem is that in current HCL the resource provider_type.name doesn't exist OR it is a list of one item, which is not only counter-intuitive (because a resource that exists is semantically very different from a list of 1 resource, and a resource that does not exist is fundamentally different from an empty list). Code which, prior to making the resource optional, was working, should have need modification to use list syntax instead. For example before making a resource optional,

resource "provider_type_1" "name1" {
  some_var = "abc"
} 
resource "provider_type_2" "name2" {
  some_other_var = provider_type_1.name
1.some_attribute ? "a" : "b"
}

Now make name1 optional, you have to change name2 to use list item 0:

resource "provider_type_1" "name1" {
  count = var.name1_enabled ? 1 : 0
  some_var = "abc"
} 
resource "provider_type_2" "name2" {
  some_other_var = provider_type_1.name1[0].some_attribute ? "a" : "b"
}

The name1[0] is actually anti-declarative: you are forcing the HCL code programmer to tell the HCL interpreter how to access the resource, whereas the fact that the resource is stored internally in a list of 1 item is an implementation detail that is of no concern to the user. Declarative implies to say what you want, and the program figures out how to do it. Here, it should be fine to write provider_type_1.name1.some_attribute ? "a" : "b" and HCL interpreter should know that obviously it is referring to the only resource provider_type_1.name1 that exists.

Similarly if I define an item as a list, it really is a list: a list of no items, 1 item, or N>1 items, all are lists and the code should reflect that. So if I write

resource "provider_type" "name" {
  count = var.number_of_name_items
  some_var = "abc"
} 

clearly the semantics of provider_type.name is that of a list, even if there is 1 item, or no items. And so all code that refers to provider_type.name should have to use list methods, even when there is only 1 or 0 item. This contrasts to the first example where count is being used as a boolean. Both uses are fundamentally different, and this should be reflected in the language.

Since enabling/disabling resources and modules is fundamentally different from creating and manipulating lists of items, it should have its own meta variable. I should be able to write

resource "provider_type_1" "name1" {
  enabled = var.name1_enabled
  some_var = "abc"
} 
resource "provider_type_2" "name2" {
  some_other_var = provider_type_1.name1.some_attribute ? "a" : "b"
}

In addition, the language should make it easy to handle the situation where code refers to a resource that has been disabled. It is probably reasonable for the above expression provider_type_1.name1.some_attribute ? "a" : "b" to fail if name1 disabled because for HCL interpreter, "if name1 exists then here is how you determine the value of provider_type_2.name2.some_other_var, but I (HCL) don't know what to do if provider_type_1.name1 does not exist". The failure is a good flag to raise to the user, so they have to think about what some_other_var should be if name1 does not exist. But in most cases, a reasonable default should be easy to write. One option would be to support a quarternary operator where the third option is used if any objects in the expression are nil. Eg

provider_type_1.name1.some_attribute ? "a" : "b" : "use_this_value_if_nil"

The third value (the 4th argument of the quaternary operator) would be used if the expression on the LHS (the 1st argument of quaternay operator) cannot be evaluated because some objects are nil. Syntax errors and calling an non-existent method on a non-nil object should still fail. An expression on LHS of ? like concat(a.b, c.d.e, f.g) would use the 3rd option if any of a, c, f, or c.d are nil.

The notion of "nullness" or "existence" should be first class in a language, and the language should minimize how much "how" the user needs to express.

jbardin avatar Mar 01 '21 14:03 jbardin

Thank you @jbardin for taking the time to write that up. The sheer length of the post forced me to also take some time to read it and understand it, and you summarized it very well.

How does it look now? Did the chances of this going through increase? It would help many Terraform users build better, nicer, cleaner and more flexible code, and adopt enabling "as they go" with no code modification effort (like adding [0] indices).

On the point of chained resources, I would say it doesn't require explicit handling. If parent resource is enabled=false, all anyhow dependent resources would also automatically enabled=false. E.g. network_security_group nsg is false, then all nsg_rules depending on nsg.id should also false. If it required explicit handling like ? : :, it would only lead to provisioning failures painful to debug.

How does that sound?

Tbohunek avatar Mar 22 '21 20:03 Tbohunek

Hi @Tbohunek, don't thank me, thank @schollii, who originally did that nice writeup as feature request in the hcl repository ;)

I agree it is a compelling feature, but there are many competing compelling features to consider for terraform, and we cannot prioritize them all. I cannot say when this might be considered, but it is not a minor feature and will take considerable planning to handle in a future version of terraform.

jbardin avatar Mar 22 '21 20:03 jbardin

Thanks @schollii ! :) @jbardin if count could co-exist with this enabled, the change should be transparent to current users. If count would change then obviously the sooner the better. More users stuck with count makes such change harder up to a point where it won't matter anymore.

Tbohunek avatar Mar 27 '21 08:03 Tbohunek

@jbardin This looks like a minor and backward-compatible effort though. While waiting for the big features to be prioritized, we keep struggling daily and produce hard-to-maintain unreadable code!

nikolay avatar Apr 10 '21 22:04 nikolay

Consider Meta-Argument condition or include_if so people stop using count= 0 or 1 or for_each empty or single item And content which allows passing a data structure instead of DSL

When choosing a Meta-Argument consider classes with existing resources.

locals {
   something = {
      prevent_destroy = true
    }
}
dynamic "lifecycle" {
    condition = can(var.ENVIRONMENT) # include_if = can(var.ENVIRONMENT)
    content = something  # as long as the map is what the resource expects in this block. i.e. its passing prevent_destroy
}

damon-atkins avatar Aug 07 '21 05:08 damon-atkins

@damon-atkins both condition and include_if are fine by me, but I'm not sure I agree with content. Can you provide an example?

I think count and enabled / condition / include_if would have to be mutually exclusive. Count means you want an array, possibly of 0 size, and the language should make it simple to handle the case where count is 0, a bit like one() in does.

Whereas enabled / condition is never an array, the resource is either there or null, and the DSL should make it easy to deal with null / disabled resource. Eg in an output that refers to resource that has enabled false, the output should be omitted without error instead of having to jump through hoops as we do now.

schollii avatar Aug 07 '21 13:08 schollii

I just ran into this when trying to create OpenStack resources, optionally using a keypair or creating one depending on a value. Unfortunately it is extremely ugly to make everything consider the keypair to be a list.

tculp avatar Dec 10 '21 18:12 tculp

content as show allow you to pass a data structure representing the HCL

locals {
   something = {
      prevent_destroy = true
      etc....
    }
}
dynamic "lifecycle" {
    include_if= can(var.ENVIRONMENT) # condition = can(var.ENVIRONMENT)
    content = something  # as long as the map is what the resource expects in this block. i.e. its passing prevent_destroy
}

vs

dynamic "lifecycle" {
    include_if = can(var.ENVIRONMENT) # condition = can(var.ENVIRONMENT)
    prevent_destroy = true
    etc.....
}

damon-atkins avatar Dec 13 '21 06:12 damon-atkins

@jbardin what is the next step in the process, for this idea to be approved in one shape or form.

damon-atkins avatar Jul 31 '22 01:07 damon-atkins

Hi all!

I think there are at least four remaining design questions to be solved before something like this could be implemented:

  • The usual reason given to motivate this addition is the desire to be able to disable something without having to change all of the references to it. Currently if I add count to aws_instance.foo then every other reference to it must become aws_instance.foo[count.index] where count.index is 0, or must be a carefully-guarded hard-coded reference to aws_instance.foo[0] to avoid trying to access element zero of an empty list when there are no instances.

    However, just adding enabled = true or enabled = false alone doesn't clearly solve that problem. We need to decide what type of value aws_instance.foo would be both when it's enabled and when it's disabled, and that decision will in turn decide what references to that value would look like.

    One idea someone raised in a similar discussion elsewhere was to say that aws_instance.foo would be null if enabled = false. If we do that without changing any other parts of the language then it suggests that aws_instance.foo[count.index].id would need to become something like aws_instance.foo != null ? aws_instance.foo.id : null, which is significantly more verbose than what it replaced.

    Any design which aims to solve this problem must describe the declaration syntax and what affect it has on references to the resource that is disabled.

  • I expect we will need to also figure out what the equivalent idiom to "chaining with for_each" would be, when we have a set of resources that need to always be enabled and disabled together. The most direct translation of count = length(aws_instance.foo) or for_each = aws_instance.foo (assuming that aws_instance.foo itself uses for_each) would be enabled = aws_instance.foo != null, which again isn't obviously better than what it replaced.

    These chaining patterns emerged because it's pretty rare for single resources to be enabled/disabled in isolation... often we need to reliably correlate instances of one resource with instances of another to ensure correct behavior, and the chaining patterns have been successful in achieving that. I think a successful design for boolean enabling ought to describe its equivalent of these chaining patterns.

  • Since we're now subject to the Terraform v1.0 Compatibility Promises, any design proposal must explain how the new feature interacts with those promises.

    The most recently-discussed design above calls for adding a new meta-argument called enabled to resource, data, and module blocks. That is a compatibility hazard for any existing resource type which has an argument named enabled or any module which has a variable "enabled" declaration. It isn't yet clear to me how we would make the transition to this new feature without potentially breaking existing modules and existing uses of providers.

  • Implementation-wise, we need to figure out how Terraform Core will track a possibly-disabled resource. Today we have the idea of an instance key which serves to differentiate an instance of a resource from the resource itself in any situation where there isn't exactly one instance of the resource.

    enabled leads to the situation where there can be zero or one instances of a resource but without an obvious tracking key to differentiate the instance from the resource. It's not clear to me yet that this is super important, but a design which aims to solve this problem will need to define how Terraform ought to track these conditional instances in both plan and state, what Terraform ought to do if an author switches from count to enabled or vice-versa, and just generally how on earth this should all work.

I'd ask you all to recognize that the count argument has been with us long enough that it's easy to take for granted all of the existing design patterns and supporting language features that make it usable in practice. It's not sufficient to just name the new argument, and instead it's important to understand the equivalents of the various other patterns module authors use in relation to count. We had a similar challenge when introducing for_each, and even despite a significant amount of research and revision of the design we still missed the rough edge that people seem to expect the [*] operator to work with resources that use for_each in the same way as it does for count, just to name one example of a situation where we failed to consider an existing commonly-used pattern.

We are still open to discussing this direction but in order to move the discussion forward we need to work together to answer the remaining design challenges. There's still plenty of design work left to do here before it would be time to move on to implementation, so I'd love to hear ideas for how to address these, and indeed any other questions we might identify along the way in the process of answering the ones I've identified above.

apparentlymart avatar Aug 16 '22 00:08 apparentlymart

@apparentlymart I think enabled = <condition> could work just like count = <condition> ? 1 : 0 and people would have to use one(aws_instance.foo[*]). Basically, enbaled would be syntactic sugar and nothing more and thus fully backward-compatible. I've also seen people writing for_each = <condition> ? [1] : [], which could also be replaced with enabled = <condition>. Lastly, HCL could be extended with the comprehension-like resource ... {} if <condition> instead of making this an attribute, but this could be a bit more complicated for the whole toolchain.

nikolay avatar Aug 16 '22 02:08 nikolay

The best sugar of it all would be to not have to add unpredictable .[0] into every reference of the counted resource, and conditions to outputs (i.e. output would default to null or [] respectively if the resources get deployed or not).

Tbohunek avatar Aug 16 '22 06:08 Tbohunek

@Tbohunek I wish one(<reference>[*]) could be sugared somehow. Talking about outputs - I wish they had for_each, count, and enabled, too!

nikolay avatar Aug 16 '22 07:08 nikolay

count has order issues, for_each does not, hence people have change code to use for_each.

include_if or enable could be the same as commenting out the section of code. e.g. this does not exist do not include it, or I do not want a setting to be set in this account/resource. Thats why I suggest name include_if

Sometime people create two tf files which are the same with parts of one deleted, because a flag like this does not exist.

count=0 and for_each=[] still create an empty item.

damon-atkins avatar Aug 16 '22 07:08 damon-atkins

To actually react on @apparentlymart's post

  • enabled = aws_instance.foo != null

How about enabled = aws_instance.foo.enabled?

For output so far I defaulted all of them into null because it doesn't really matter.

Tbohunek avatar Aug 16 '22 11:08 Tbohunek

The following could be the same, i.e. if include_if is false treat the following as if it is commented out.

dynamic "lifecycle" {
    include_if= can(var.DESTROY) # condition = can(var.DESTROY)    Evaluated to TRUE, included lines
    prevent_destroy = var.DESTROY
}
# dynamic "lifecycle" {
#    include_if= can(var.DESTROY) # condition = can(var.DESTROY)   Evaluated to FALSE,  exclude lines
#    prevent_destroy = var.DESTROY
# }

damon-atkins avatar Aug 16 '22 17:08 damon-atkins

We cannot add any new synthetic attributes to an expression like aws_instance.foo, because that's already defined to be an object of a type defined by the provider's schema. If we wanted to expose metadata about a resource block in addition to the actual data for that object then I expect we'd need to do it under a different namespace such as meta.aws_instance.foo, although I've not deeply considered the implications of doing that.

apparentlymart avatar Aug 16 '22 23:08 apparentlymart

Wouldn't the chosen attribute like .enabled become part of the schema for all objects? The caveat here is rather to make .enabled accessible on an object that doesn't actually exist.

Tbohunek avatar Aug 17 '22 05:08 Tbohunek

This comes back to my comment about defining what exactly aws_instance.foo evaluates to when we have a resource with enabled = true or enabled = false set. It's true that Terraform could add a synthetic additional attribute to the resource type's schema so that .enabled could be true in the case where it's enabled, but as you noted <null object>.enabled would fail with an error, so I don't think this design is quite right yet.

(Note that this is also in the vicinity of the other point I made about distinguishing the resource from the resource instance. For count and for_each we can unambigously say that aws_instance.foo refers to the resource as a whole while aws_instance.foo[0] or aws_instance.foo["example"] refer to instances of that resource; the problems with aws_instance.foo.enabled are a good example of the confusion that arises when there isn't some way to distinguish the resource as a whole from the instances of it.)

I also want to clarify that my intent with the previous comment wasn't to suggest that we can puzzle out all of those details individually over the course of many comments. Instead, I was imagining somebody writing a design proposal document that covers the whole problem space and proposed solutions all together, so that we can see how the different parts interact and make sure it all adds up to something that actually works as part of the broader language.

apparentlymart avatar Aug 17 '22 16:08 apparentlymart

In some languages (eg Python), null is an object ie it has a type and attributes just like any other (it is called None in Python but same thing). Could something like that work? So there would be a built-in resource that represents null and has enabled = false. Every other object would have enabled = true.

Also, let's not forget that the real goal is to not have to litter the code with [0] and code that checks for empty list. Groovy's approach to handling null should be considered : something like .? means null is returned if the object before it is null. In other words if object is null and the attribute referenced on it does not exist then null is returned.

schollii avatar Aug 17 '22 21:08 schollii

@apparentlymart or anyone why do we need to read anything back? Why just have the HCL parser ignoring a set of lines not enough to do the job?

On a side note we already have can() which can test something exists or not.

damon-atkins avatar Aug 18 '22 01:08 damon-atkins

Of course it depends on the task at hand but at minimum it's typical for a module to output something about the object it declared so that the caller can use it:

resource "aws_instance" "example" {
  count = var.enable_web_servers ? 5 : 0

  # ...
}

output "web_server_ip_addresses" {
  value = toset(aws_instance.example[*].private_ip)
}

My comment above was saying that a full proposal for adding any new mechanism for changing how many instances there are of a resource (even if it's just limited to zero or one instances) ought to also address the broader context of how the resulting resource would be used. If the design doesn't address that then it's probable that authors would need to keep using count because that's what the rest of the language and its common idioms are built around.

apparentlymart avatar Aug 18 '22 16:08 apparentlymart

Any reason why output can not also have include_if or something.

resource "aws_instance" "webserver" {
   include_if= can(env.MAX_WEB_SERVERS)
   count = env.MAX_WEB_SERVERS

  # ...
}

output "web_server_ip_addresses" {
  include_if= can(aws_instance.webserver[*].private_ip)
  value = toset(aws_instance.webserver[*].private_ip)
}

If include_if is false do not include the current block { } and children blocks

damon-atkins avatar Aug 19 '22 18:08 damon-atkins

@damon-atkins you can already do what you describe with one(), no?

output "web_server_ip_addresses" {
  value = one(aws_instance.webserver[*].private_ip)
}

will be null if the expression does not evaluate and will be the specified attribute of item[0] if there is a zeroth item.

I've never sat down and tried to see if one() might be a suitable workaround for the problem that this current github issue attempts to address. It is certainly not a solution, but it might be a workaround.

schollii avatar Aug 19 '22 23:08 schollii

@schollii what I am suggesting can be placed anywhere as part of a block

layerone {
   include_if =
   ... 
     layertwo {
        include_if =
        ... 
        layerN {
            include_if =
            ... 
} } }

damon-atkins avatar Aug 20 '22 01:08 damon-atkins

I think there is a much simpler answer, both for everyone here and for your team to implement @apparentlymart.

We need to decide what type of value aws_instance.foo would be both when it's enabled and when it's disabled, and that decision will in turn decide what references to that value would look like.

I don't believe that is necessary for the situation where something should or should not exist. I'm going to suggest a different answer below that will satisfy everything I personally want out of this situation, while also not requiring you to dance around a reference to a non-existent thing. I don't personally care what the meta control or attribute is named, so don't focus on the name of it.

I'll reuse your example. In doing so I'd like to point out that this is more readable, and doesn't try to do if/then logic within a parameter designed to provide a number of instances.

resource "aws_instance" "example" {
  only_if = var.enable_web_servers == true
  count   = 5 

  # ...
}

output "web_server_ip_addresses" {
  only_if = var.enable_web_servers == true
  value   = toset(aws_instance.example[*].private_ip)
}

In this example the only_if parameter (which, again, I don't care about the name of) informs the language parser whether or not to add the resource at all. My suggestion is that we don't attempt to track the existence or non-existence of this resource any more or any differently than we track a resource which has been removed from the file.

I expect we will need to also figure out what the equivalent idiom to "chaining with for_each" would be, when we have a set of resources that need to always be enabled and disabled together. The most direct translation of count = length(aws_instance.foo) or for_each = aws_instance.foo (assuming that aws_instance.foo itself uses for_each) would be enabled = aws_instance.foo != null, which again isn't obviously better than what it replaced.

I believe that it's perfectly valid to expect that the parameters used to control the creation of the resource can also be used to control the downstream resources (as shown above in the output). In short, the implementor is responsible for downstream impacts of a conditional. The language isn't responsible for resolving a count of instances that don't exist.

Example again:

resource "aws_security_group" "example" {
  only_if = var.enable_web_servers == true
  for_each = aws_instance.foo
...
}

These chaining patterns emerged because it's pretty rare for single resources to be enabled/disabled in isolation... often we need to reliably correlate instances of one resource with instances of another to ensure correct behavior, and the chaining patterns have been successful in achieving that. I think a successful design for boolean enabling ought to describe its equivalent of these chaining patterns.

I believe that a meta control which enables or disables a resource can also be used by the implementor to enable or disable the downstream resources. To be honest, we always ended using this parameter anyway because the logic to do nothing if there were no resources became funky multi-line for ... condition ? {} : { another for ... } and very hard to read. The above example would allow us to separate the condition from the loop data. Many tortured for loops will be forever released from their chains.

the problems with aws_instance.foo.enabled are a good example of the confusion that arises when there isn't some way to distinguish the resource as a whole from the instances of it.)

I don't think you need to track the lack of a resource any differently than you track the lack of any other resource. I am not the only one who is proposing this 😁

Yes, it is possible that sort of ghost resource could be added to the language to track non-existent things. I think that would be its own proposal, separate and distinct from what people want here: to avoid overloading count and for_each to do conditional logic.

EtA: yes, this is much the same as what @damon-atkins proposed with include_if but with everything spelled out. Sorry, I started writing this before he had posted that and I missed it.

jorhett avatar Aug 20 '22 19:08 jorhett

I'm not sure why everybody's so inclined for these long and not very meaningful include_if and only_if! Do we have include_if or only_if in the for expressions? No! So, why can't this just be if if you don't like enabled? I've seen tons of modules, already using enabled in parameter along with the logic count = var.enabled ? 1 : 0, so, using enabled would match what people already use and simply replace the above with enabled = var.enabled. But changing HCL to have an if would be the best as it will complement the for expressions, I think. It enhances HCL in a nice way allow it to support conditional resources, which would be a win for HCL itself, too!

nikolay avatar Aug 20 '22 20:08 nikolay