terraform Unknown values should not block successful planning

Unknown values should not block successful planning

Open apparentlymart opened this issue 2 years ago • 33 comments

The idea of "unknown values" is a crucial part of how Terraform implements planning as a separate step from applying.

An unknown value is a placeholder for a value that Terraform (most often, a Terraform provider) cannot know until the apply step. Unknown values allow Terraform to still keep track of type information where possible, even if the exact values aren't known, and allow Terraform to be explicit in its proposed plan output about which values it can predict and which values it cannot.

Internally, Terraform performs checks to ensure that the final arguments for a resource instance at the apply step conform to the arguments previously shown in the plan: known values must remain exactly equal, while unknown values must be replaced by known values matching the unknown value's type constraint. Through this mechanism, Terraform aims to promise that the apply phase will use the same settings as were used during planning, or Terraform will return an error explaining that it could not.

(For a longer and deeper overview of what unknown values represent and how Terraform treats them, see my blog post Unknown Values: The Secret to Terraform Plan.)

The design goal for unknown values is that Terraform should always be able to produce some sort of plan, even if parts of it are not yet known, and then it's up to the user to review the plan and decide either to accept the risk that the unknown values might not be what's expected, or to apply changes from a smaller part of the configuration (e.g. using -target) in order to learn more final values and thus produce a plan with fewer unknowns.

However, Terraform currently falls short of that goal in a couple different situations:

The Terraform language runtime does not allow an unknown value to be assigned to either of the two resource repetition meta-arguments, count and for_each.

In that situation, Terraform cannot even predict how many instances of a resource are being declared, and it isn't clear how exactly Terraform should explain that degenenerate situation in a plan and so currently Terraform gives up and returns an error:

│ Error: Invalid for_each argument
│
│ ...
│
│ The "for_each" value depends on resource attributes that cannot
│ be determined until apply, so Terraform cannot predict how many
│ instances will be created. To work around this, use the -target
│ argument to first apply only the resources that the for_each
│ depends on.

│ Error: Invalid count argument
│
│ ...
│
│ The "count" value depends on resource attributes that cannot be
│ determined until apply, so Terraform cannot predict how many
│ instances will be created. To work around this, use the -target
│ argument to first apply only the resources that the count depends
│ on.

If any unknown values appear in a provider block for configuring a provider, Terraform will pass those unknown values to the provider's "Configure" function.

Although Terraform Core handles this in an arguably-reasonable way, we've never defined how exactly a provider ought to react to crucial arguments being unknown, and so existing providers tend to fail or behave strangely in that situation.

For example, some providers (due to quirks of the old Terraform SDK) end up treating an unknown value the same as an unset value, causing the provider to try to connect to somewhere weird like a port on localhost.

Providers built using the modern Provider Framework don't run into that particular malfunction, but it still isn't really clear what a provider ought to do when a crucial argument is unknown and so e.g. the AWS Cloud Control provider -- a flagship use of the new framework -- reacts to unknown provider arguments by returning an error, causing a similar effect as we see for count and for_each above.

Although the underlying causes for the errors in these two cases are different, they both lead to a similar problem: planning is blocked entirely by the resulting error and the user has to manually puzzle out how to either change the configuration to avoid the unknown values appearing in "the wrong places", or alternatively puzzle out what exactly to pass to -target to select a suitable subset of the configuration to cause the problematic values to be known in a subsequent untargeted plan.

Terraform should ideally treat unknown values in these locations in a similar way as it does elsewhere: it should successfully produce a plan which describes what's certain and is explicit about what isn't known yet. The user can then review that plan and decide whether to proceed.

Ideally in each situation where an unknown value appears there should be some clear feedback on what unknown value source it was originally derived from, so that in situations where the user doesn't feel comfortable proceeding without further information they can more easily determine how to use -target (or some other similar capabililty yet to be designed) to deal with only a subset of resources at first and thus create a more complete subsequent plan.

This issue is intended as a statement of a problem to be solved and not as a particular proposed solution to that problem. However, there are some specific questions for us to consider on the path to designing a solution:

Is it acceptable for Terraform to produce a plan which can't even say how many instances of a particular resource will be created?

That's a line we've been loathe to cross so far because the difference between a couple instances and tens of instances can be quite an expensive bill, but the same could be said for other values that Terraform is okay with leaving unknown in the plan output, such as the "desired count" of an EC2 autoscaling group. Maybe it's okay as long as Terraform is explicit about it in the plan output?

A particularly "interesting" case to consider here is if some instances of a resource already exist and then subsequent changes to the configuration cause the count or for_each to become retroactively unknown. In that case, the final result of count or for_each could mean that there should be more instances of the resource (create), fewer instances of the resource (destroy), or no change to the number of instances (no-op). I personally would feel uncomfortable applying a plan that can't say for certain whether it will destroy existing objects.
Conversely, is it acceptable for Terraform to automatically produce a plan which explicitly covers only a subset of the configuration, leaving the user to run terraform apply again to pick up where it left off?

This was essence of the earlier proposal #4149, which is now closed due to its age and decreasing relevance to modern Terraform. That proposal made the observation that, since we currently suggest folks work around unknown value errors by using -target, Terraform could effectively synthesize its own -target settings to carve out the maximum possible set of actions that can be taken without tripping over the two problematic situations above.
Should providers (probably with some help from the Plugin Framework) be permitted to return an entirely-unknown response to the UpgradeResourceState, ReadResource, ReadDataSource, and PlanResourceChange operations for situations where the provider isn't configured completely enough to even attempt these operations?

These are the four operations that Terraform needs to be able to ask a partially-configured provider to perform. If we allow a provider to signal that it isn't configured enough to even try at those, what should Terraform Core do in order to proceed with that incomplete or stale information?
We most frequently encounter large numbers of unknown values when planning the initial creation of a configuration, when nothing at all exists yet. That is definitely the most common scenario where these problems arise, but a provider can potentially return unknown values even as part of an in-place update if that is the best representation of the remote API's behavior -- for example, perhaps one of the output attributes is derived from an updated argument in a way that the provider cannot predict or simulate.

Do we need to take any extra care to deal with the situation where an unknown value cascades downstream from an updated or replaced resource instance?

For example, if I've used an attribute from a vendor-specific Kubernetes cluster resource to provide an API URL to the hashicorp/kubernetes provider and the user changes the configuration of the cluster itself in a way that causes the API URL to change, how should Terraform and the Kubernetes provider react to the cluster URL being unknown even though there are existing objects bound to resources belonging to that provider which we will need to refresh and plan?
What sort of analysis would we need to implement in order to answer questions like "why is this value unknown?" and "what subset of actions could I take in order to make this value be known?"?.

Apr 26 '22 21:04 apparentlymart

terraform terraform copied to clipboard

Unknown values should not block successful planning

terraform
terraform copied to clipboard