opentofu icon indicating copy to clipboard operation
opentofu copied to clipboard

Ability to have OpenTofu automatically import an existing object if it exists, or create it otherwise

Open ImIOImI opened this issue 1 year ago • 23 comments

OpenTofu Version

❯ tofu version
OpenTofu v1.7.0
on linux_amd64

Use Cases

There are numerous reasons why someone might want to be able to have a data source that returns a null instead of an error when it doesn't exist.

  • while migrating from legacy click ops patterns to IAC, you might not be certain of the current state and want to normalize it amongst environments. For example, a resource might exist in dev, or prod but might be absent stage and you want to use the same code for all three environments.
  • uncoupling dependent environments:
data "aws_eks_cluster" "doesnt_exist" {
 name = "doesnt-exist"
}

resource "helm_release" "test" {
 count = data.aws_eks_cluster.doesnt_exist.id == null ? 0 : 1
 ...
}

etc.

Attempted Solutions

data "aws_vpc" "doesnt_exist" {
  tags = {
    Name = "exists"
  }
}

resource "aws_vpc" "exists" {
  count = data.aws_vpc.doesnt_exist.id == null ? 0 : 1
  cidr_block = "10.111.0.0/16"
  tags = {
    Name = "exists"
  }
}

Proposal

We should add a nullable attribute to data sources that allows them to be able to return a null.

data "aws_vpc" "doesnt_exist" {
  tags = {
    Name = "exists"
  }
  nullable = true
}

resource "aws_vpc" "exists" {
  count = data.aws_vpc.doesnt_exist.id == null ? 0 : 1
  cidr_block = "10.111.0.0/16"
  tags = {
    Name = "exists"
  }
}

References

No response

ImIOImI avatar Jul 01 '24 21:07 ImIOImI

Seems related to https://github.com/opentofu/opentofu/issues/1289, specifically the "ClickOps to IaC" migration use-case

RLRabinowitz avatar Jul 02 '24 13:07 RLRabinowitz

Also likely related to https://github.com/opentofu/opentofu/issues/1306

cam72cam avatar Jul 02 '24 14:07 cam72cam

This is a great suggestion @ImIOImI, I'd love to see this solved.

In the meantime, a possible workaround to postpone the evaluation of data sources to the apply phase is to use deployment triggers as shown in the following example:

data "aws_subnet" "subnet" {
  filter {
    name = "tag:Name"
    values = [
      "us-east-1a",
    ]
  }

  depends_on = [
    null_resource.initial_deployment_trigger,
  ]
}

resource "null_resource" "initial_deployment_trigger" {}

This will allow you to run e.g. terraform plan even if data sources are not available yet.

soerenmartius avatar Jul 04 '24 18:07 soerenmartius

You can actually do this already, by using an attribute reference for the name from the tags attribute of the vpc resource, instead of using the string value of the name. No null_resource or depends_on needed.

You also need the same count condition on the data source that is used on the resource.

lorengordon avatar Jul 04 '24 18:07 lorengordon

You can actually do this already, by using an attribute reference for the name from the tags attribute of the vpc resource, instead of using the string value of the name. No null_resource or depends_on needed.

You also need the same count condition on the data source that is used on the resource.

Perhaps the example isn't great, but this wasn't meant to be scoped to just vpcs. I could have just as easily picked something like:

data "azurerm_kubernetes_cluster" "this" {
  name                = "my_cluster"
  resource_group_name = "my_resource"
}

resource "helm_release" "do_the_thing" {
  count = length(data.azurerm_kubernetes_cluster.this.name) ? 1 : 0 
  chart = "a_special_chart"
  name  = "myAwesomeChart"
}

ImIOImI avatar Jul 04 '24 19:07 ImIOImI

When I'm back at a computer next week, I'll write up some example code so you can see what I mean. Nothing I'm suggesting is specific to vpcs.

lorengordon avatar Jul 04 '24 19:07 lorengordon

I would like to add that regarding this suggestion, there could be some challenges that I'm not sure if we'd be able to tackle

To implement this, we'd probably have to either

  • Expand on the provider implementation API, allowing data sources to return nulls, and having the Tofu CLI handle this case correctly. This would require provider authors to change how the provider is set up in a way that's probably only going to be compatible with OpenTofu, which I believe is not very likely at the moment, and is something that's on a provider-by-provider basis
  • Have OpenTofu interpret specific errors from the provider as "Not Found", and return null in these cases. This might also be on a provider-by-provider basis, as each provider author is free to implement errors and exceptions as they see fit

RLRabinowitz avatar Jul 07 '24 09:07 RLRabinowitz

What about adding something to the lifecycle block that can catch any data source error and allow it to return null instead of an error? That way it could be applied to any data source without relying on developers to integrate it into their provider. As @ImIOImI said, this would be really useful to determine whether to create something if it doesn't exist.

To use his example:

data "aws_eks_cluster" "doesnt_exist" {
  name = "doesnt-exist"

  lifecycle {
    null_on_error = true
  }
}

Or something along those lines.

tgmatt avatar Aug 13 '24 01:08 tgmatt

@tgmatt the main problem here is that you'd be returning null in all error cases, such as if there's a network issue. This would make the code very unreliable.

ghost avatar Aug 13 '24 06:08 ghost

Admittedly I don't know the code but presumably there are different exceptions, or would they not be exposed to tofu? I suppose even then how would you know it's a legitimate error.

I don't see how this feature could ever be implemented without causing potential problems down the line. Perhaps a warning in the documentation could alleviate the risks, but I see this as a "killer" feature that would enable more elaborate automation and decision making in one's IaC.

For context, our example is such that we'd like to use the Netbox provider to call Netbox to see if a prefix is assigned to a particular VPC, then if not, reserve one.

tgmatt avatar Aug 13 '24 19:08 tgmatt

Beside the point of breaking backwards compatibility (where we still uphold the compatibility promise), I believe the right way to implement this would be to create typed errors in the provider protocol. However, with that we are tied to HashiCorp as we don't want to create an ecosystem split.

ghost avatar Aug 13 '24 19:08 ghost

Beside the point of breaking backwards compatibility (where we still uphold the compatibility promise), I believe the right way to implement this would be to create typed errors in the provider protocol. However, with that we are tied to HashiCorp as we don't want to create an ecosystem split.

I appreciate where you're coming from, but has that horse not already bolted with such things as state file encryption, though? Does that not already make Tofu state fundamentally incompatible with Terraform?

tgmatt avatar Sep 26 '24 11:09 tgmatt

@tgmatt the provider protocol is a whole different part of the ecosystem and one that we can not change lightly. All of the other changes you mentioned are opt-in and don't impact compatibility with the core set of features that both OpenTofu and Terraform share.

cam72cam avatar Sep 26 '24 12:09 cam72cam

Thanks for the clarification @cam72cam. It's a shame because I feel like being so tightly tied to HashiCorp for some of these things limits potential for change that OpenTofu could (and imho should) bring. Perhaps I could bring this up for discussion in one of the weekly community meetings.

tgmatt avatar Sep 26 '24 12:09 tgmatt

As we currently stand it's up to the designer of a provider to decide which situations return errors and which situations successfully return some sort of "absent" response.

For example, in the hashicorp/http provider today has its http data source return a successful result (from OpenTofu's perspective) even if the HTTP server returns an unsuccessful status code, but I believe the provider does fail the request if the client cannot connect to the server at all.

The hashicorp/aws provider also offers an aws_vpcs data source which can return an empty set if there are no VPCs that match the query.

It is technically already possible today for a provider to make this behavior configurable by an argument. Having the provider make the decision allows for a finer-grain approach where the provider could potentially allow configuring different handling for different kinds of error, whereas today OpenTofu core can only react to whether the response includes error diagnostics or not.


I think, though, that this request's framing as a feature for data sources is hiding a more sinister design problem that would need to be addressed:

OpenTofu is a desired state system where effectively the configuration is a program for deciding a desired state and then the execution engine compares the desired state with the current actual state to decide which actions to take.

This execution model doesn't really work properly if a particular object's desired state is defined in terms of its own current state, as the example in the proposal implies. We can explore this today using a working example using the plural aws_vpcs data source:

data "aws_vpcs" "maybe" {
  tags = {
    Name = "exists"
  }
}

resource "aws_vpc" "exists" {
  count = length(data.aws_vpcs.maybe.ids) == 0 ? 1 : 0

  cidr_block = "10.111.0.0/16"
  tags = {
    Name = "exists"
  }
}

Today's OpenTofu would understand the above, restated into plain English, as "the VPC exists if it doesn't exist", which is an inherent contradition. If you were to try this today you'd find that on the first plan/apply round OpenTofu would propose to create aws_vpc.exists as expected, but then on the next plan/apply round length(data.aws_vpcs.maybe.ids) == 0 would be false (there is now a VPC matching the criteria) and so that resource now effectively has count = 0, and so OpenTofu would propose to destroy it. This flip-flopping back between zero and one instances would continue indefinitely because this configuration contradicts itself and so it cannot possibly converge.

With all of that said, I wonder if this issue would be better framed as its root use-case -- which I read as something like "Ability to have OpenTofu automatically import an existing object if it exists, or create it otherwise" -- and not as a new feature for data resources.

Unless I'm missing something here, I don't think implementing the described feature exactly as described would actually meet the given use-cases. I think the use-cases are interesting and worth discussing more, but I don't think this particular solution to those use-cases is viable as currently stated.

apparentlymart avatar Oct 02 '24 02:10 apparentlymart

With all of that said, I wonder if this issue would be better framed as its root use-case -- which I read as something like "Ability to have OpenTofu automatically import an existing object if it exists, or create it otherwise" -- and not as a new feature for data resources.

I think this is absolutely a fair solution that fulfills my desired intent. As you correctly point out this is solving a symptom of a larger problem which is that I might not know my desired state ahead of time (for whatever reason) when writing the code. Even if this issue was implemented as I described, I still should still import the existing resources later. Therefore, cutting out that step would be more helpful than just allowing nullable resources.

ImIOImI avatar Oct 02 '24 13:10 ImIOImI

I found this thread after finding a 6 year old locked issue in terraform where multiple people are asking for the same feature. Apparently this is something people have wanted for some time.

The use case that I have is when using transit gateway peering across multiple accounts.

You cannot filter the data based on the name before it is accepted, so it doesn't have a name while it's pending, and it's not pending once it's available. Considering that there are multiple resources associated with the same transit gateway it's not enough to filter based on ID or availability state.

Being able to query two different results, and pick the one that has the proper return result would be ideal. I tried the following, but it obviously doesn't work.

Filtering based on both pending and available doesn't work as it returns too many results, and filtering based on name doesn't work while it's pending.

data "aws_ec2_transit_gateway_peering_attachment" "pending_peering_attachment" {
    provider = aws.peer

    filter {
        name   = "state"
        values = ["pendingAcceptance"]
    }

    filter {
        name  = "transit-gateway-id"
        values = [aws_ec2_transit_gateway_peering_attachment.local_attachment.peer_transit_gateway_id]
    }
}

data "aws_ec2_transit_gateway_peering_attachment" "named_peering_attachment" {
    provider = aws.peer

    filter {
        name   = "state"
        values = ["available"]
    }

    filter {
        name  = "transit-gateway-id"
        values = [aws_ec2_transit_gateway_peering_attachment.local_attachment.peer_transit_gateway_id]
    }

    filter {
        name  = "tag:Name"
        values = ["My-Peer-Gateway-Attachment"]
    }
}

locals {
    peering_attachment_id = try(data.aws_ec2_transit_gateway_peering_attachment.pending_peering_attachment.id, data.aws_ec2_transit_gateway_peering_attachment.named_peering_attachment.id)
}

either doing a try, or a ternary operator of length(data) > 0 ? ... would even be acceptable.

If we could have ignore_empty = true even just to disable the error, allowing the result to continue this would be sufficient.

For now, I guess I'm stuck commenting out my code every time the state changes. I would be more than happy to write any proposed changes in if someone just points me to where the modifications can be made. This is a huge headache, and has me leaning towards not managing my network with terraform/tofu at all, and just doing it by hand.

I also came across escape blocks, which looks like it could be a potential solution to my use case - although it isn't a usable feature. https://github.com/opentofu/opentofu/blob/main/internal/configs/testdata/escaping-blocks/data/data-escaping-block.tf

alephpt avatar Oct 16 '24 17:10 alephpt

I don't know if this will help the OP's case, but maybe this is a viable solution for others.

I came across the _ escape block test when looking through the data block implementation code. It appears to be used to apply additional filtering when multiple results are found, but I assume that that is an error similar to as if there were no results found. The feature does not appear to be implemented, according to my language server, but the code runs and executes just fine.

From what I can tell, the code below attempts to query based on "pendingAcceptance", and if that fails to due to state change then it hits the escape block and attempts to filter based on "Available" results with the given name. This seems to work for my case where my data is either [pending, ""] or [available, "named"] but never both.

I don't know if there is any way to use this to toggle some value or return any default values, but this definitely solved my issue where data parameters change based on the state of the infrastructure.

data "aws_ec2_transit_gateway_peering_attachment" "peering_attachment" {
    provider = aws.peer

    filter {
        name   = "state"
        values = ["pendingAcceptance"]
    }

    filter {
        name  = "transit-gateway-id"
        values = [aws_ec2_transit_gateway_peering_attachment.local_attachment.peer_transit_gateway_id]
    }

    _ {
        filter {
            name   = "state"
            values = ["available"]
        }

        filter {
            name  = "tag:Name"
            values = ["My-Peer-Gateway-Attachment"]
        }
    }
}

alephpt avatar Oct 16 '24 18:10 alephpt

We briefly discussed this in the OpenTofu Community Slack, and I just want to leave a summary of what I said here because this will be easier to find in future.

The "escaping block" example in the previous comment ought to be exactly equivalent to the following:

data "aws_ec2_transit_gateway_peering_attachment" "peering_attachment" {
  provider = aws.peer

  filter {
    name   = "state"
    values = ["pendingAcceptance"]
  }

  filter {
    name   = "transit-gateway-id"
    values = [aws_ec2_transit_gateway_peering_attachment.local_attachment.peer_transit_gateway_id]
  }

  filter {
    name   = "state"
    values = ["available"]
  }

  filter {
    name  = "tag:Name"
    values = ["My-Peer-Gateway-Attachment"]
  }
}

That results in a configuration where there are two filter blocks with the same name, which is valid as far as OpenTofu is concerned and up to the provider to make sense of. In this case the underlying API has a similar shape to the provider's schema, albeit expressed in XML rather than in HCL, and so I would guess (but have not confirmed) that the provider is sending the API four Filter elements that correspond directly to these four filter blocks, and so in the end it's the EC2 API deciding what it means when two filters have the same Name.

Based on my knowledge of how filter blocks work, I can't imagine any way that using a filter block would cause OpenTofu to make two API requests and choose just the one that succeeded, or any other similar thing that might resemble what this issue is asking for, so I have to assume that something else is going on here, but it's not clear to me what that is and I don't currently have an AWS account to test with.

If the escaping block example is behaving differently than the flattened version I showed above, it would be helpful if someone could try reading both forms of this data source configuration with the TF_LOG_PROVIDER=trace environment variable set, hopefully causing the AWS provider to log exactly what query it sent to the EC2 API in each case, so that we can compare the escaping block version with the flattened version to see what the difference is, if any.

apparentlymart avatar Oct 18 '24 22:10 apparentlymart