terraform-plugin-framework icon indicating copy to clipboard operation
terraform-plugin-framework copied to clipboard

Resource Timeouts Support

Open bflad opened this issue 3 years ago • 4 comments

Module version

v0.9.0

Use-cases

Resource timeouts are a way for practitioners to override the acceptable amount of time for a data source read operation or resource operation such as create, delete, read, or update. They are considered a meta-schema detail of data sources and resources, or put another way, a feature of all data sources and resources similar to the lifecycle configuration block. While not officially part of the Terraform configuration language, such as a protected keyword, this feature has existed long enough through what eventually became terraform-plugin-sdk that many providers, and therefore practitioners, have come to rely on it. It is documented currently in the configuration language documentation.

To practitioners, resource timeouts are implemented as a configuration block in the root schema of the data source or resource, e.g.

data "example_thing" "example" {
  # ... other attributes/blocks

  timeouts {
    read = "5m"
  }
}

resource "example_thing" "example" {
  # ... other attributes/blocks

  timeouts {
    create = "60m"
    delete = "60m"
    read   = "5m"
    update = "60m"
  }
}

Beyond this conventional configuration detail for practitioners, all handling for resource timeouts is handled by terraform-plugin-sdk. Core does not perform any validation beyond typical configuration syntax checks.

terraform-plugin-sdk Resource Timeouts

The terraform-plugin-sdk implementation of resource timeouts is based on Go time.Duration values and is automatically applied to all data sources and resource implementations. The SDK only strictly enforces these timeouts by forcibly canceling in progress provider logic, if the helper/schema.Resource type CreateContext, DeleteContext, ReadContext, and UpdateContext fields are used. Otherwise, it is up to provider developers to read and/or use these timeout values as they wish, such as implementing the resource.RetryContext() function or resource.StateChangeConf type (and calling its WaitForStateContext() method).

Data source and resource create, delete, read, and update logic has access to the timeout data via the (helper/schema.ResourceData).Timeout(string) method. By default, the SDK returns a default value of 20*time.Minute for each operation, unless the provider developer has overridden them via the helper/schema.Resource type Timeouts field. Practitioners can override either of these values by specifying the timeouts configuration block in their resource configuration.

The timeouts configuration block supports create, delete, read, and update string attributes, where the values are parsed via the time.ParseDuration() function. This schema is not declared by provider developers, nor is it implemented as a typical schema definition (e.g. helper/schema.Schema). Terraform core does not currently perform configuration validation against the schema, so the SDK can misuse this detail. The configuration handling implementation relies on some older SDK logic for manually parsing the configuration data that comes across the protocol RPC before it reaches provider logic. It has some unintended behaviors such as allowing multiple configuration blocks. Eventually (skipping over a large chunk of SDK internals like the InstanceDiff/InstanceState types, Meta fields, etc.) the configuration data makes its way into the private state data during planning, which ResourceData can later use to fetch the values. The private state data is a JSON map, with a special e2bfb730-ecaa-11e6-8f88-34363bc7c4c0 key.

The resource timeouts handling in terraform-plugin-sdk is the only private state data handling that occurs. Provider developers have no other access or ability to manipulate private state data.

Attempted Solutions

Store resource timeouts via normal resource state storage, which is always shown in plans.

Proposal

Build resource timeouts functionality on top of #399. This code for the functionality should likely reside outside the terraform-plugin-framework repository, so it can be versioned and maintained separately. Core features include:

  • The ability to read configuration block details into the private state
  • The ability to fetch resource timeouts from the private state, or return a default of 20 minutes

It may be good to also provide and recommend the timeouts schema block, so there is no disconnect between the schema being used and the actual schema definition.

When implementing the functionality, it should be investigated whether to document the terraform-plugin-sdk e2bfb730-ecaa-11e6-8f88-34363bc7c4c0 key in case provider developers should reuse it. Providers would only need to care about the existing key should a plan be created on a terraform-plugin-sdk based provider, but applied on a terraform-plugin-framework based provider, which may not even be possible to do.

References

  • #399
  • #62

bflad avatar Jul 06 '22 14:07 bflad

I'm not sure if it's a material detail for the purpose of what you were describing here, but I just wanted to note that the special handling of timeouts today does live entirely inside SDKv2, and Terraform Core has no special awareness of it.

The trick is that SDKv2 inserts some extra stuff into the schema written by the provider developer as part of translating it to the form that the wire protocol expects: https://github.com/hashicorp/terraform-plugin-sdk/blob/62e2d2d909610f2178634a2b84f999726ded1166/helper/schema/core_schema.go#L312-L361

So these do end up being a real nested block with attributes in the schema by the time Terraform Core sees it, and Terraform Core doesn't need to do anything special.

The other half of the special handling lives inside the gnarly shim code that translates from protocol 5 assumptions to the SDK's assumptions that are built around how Terraform v0.11 behaved. I don't remember all of the details about that stuff but here's a pertinent part I found while quickly skimming the code: https://github.com/hashicorp/terraform-plugin-sdk/blob/62e2d2d909610f2178634a2b84f999726ded1166/helper/schema/grpc_provider.go#L830-L847

All of this weird machinery there is to preserve this idea that timeouts are special in SDKv2 even though Terraform Core has never really known anything about them. It's true that in Terraform v0.11 Terraform Core would just pass all this stuff through without checking it, because in those older Terraform versions the core runtime just treated the resource config and state largely as an opaque bag of key/value pairs (which is why the language's ability to work with that data in expressions was so limited), but when we introduced the real type system in Terraform v0.12 we needed to give Terraform Core just enough information about timeouts that it would pass them through in similar enough way to what helper/schema expected, and then let SDKv2 do all of the weirdo stuff that needs to happen.

It isn't super clear to me why exactly SDKv2 needed to treat timeouts in this special way. At the time I suspected it was because at the time we designed it there was no clear architectural boundary between Terraform Core and what we now call the SDK: it was just different parts of the same codebase tightly coupled together, and so it was perhaps easy to treat it in a special way that made it a little more convenient for provider developers to use it consistently.

I think one question I'd have is whether this really needs to be treated in this special way with hidden private data, or whether it would be sufficient to just track these values as normal attributes. They always need to surface as real attributes in order for Terraform Core to accept them, so I would assume that if you kept the data in "private" then that would be in addition to the values stored in the normal state, rather than instead of. Is there something else hidden away in private that isn't just a copy of something provided in the attribute values inside the timeouts block? :thinking:

(If these arguments are not already showing up in plans then I expect that can only be because SDKv2 is exempt from the plan and apply consistency checks in Terraform Core, causing those violations to appear as warnings in the hidden logs rather than visible errors. Since the new framework is not exempt from those checks, I would expect any attempt to hide the timeouts block from the plan diff will lead to Terraform Core rejecting the plan as invalid, because it doesn't match what the user configured.)

apparentlymart avatar Jul 07 '22 22:07 apparentlymart

I think one question I'd have is whether this really needs to be treated in this special way with hidden private data, or whether it would be sufficient to just track these values as normal attributes.

It may not need to be handled as private state data, but there are some other potential considerations:

  • Unlike its predecessor, this framework offers functionality to retrieve and modify entire configuration (retrieve only), plan, and state schema-based data. Needing to include the timeouts data with "regular" plan and state data when using that functionality may introduce unwarranted complexity for provider developers.
  • Provider developers may not wish to have timeout differences show in plans, which if logistically possible, would preclude the timeouts data being part of a proposed new state, etc. and instead only using the schema and configuration parts of Terraform with private state for the actual data storage.

Part of this effort was intended to be re-examining the functionality with the new "correctness" constraints imposed outside terraform-plugin-sdk, but that has not be done yet. It may turn out that treating this functionality the same as any other regular configuration-plan-state data is actually a requirement.

bflad avatar Jul 07 '22 22:07 bflad

Just to mention since it popped into my brain for whatever reason: one particular quirk of terraform-plugin-sdk based timeouts functionality is noted in https://github.com/hashicorp/terraform-plugin-sdk/issues/963; it may not be possible to trigger timeout updates without a full apply cycle. That said, the ReadResource RPC only supplies the prior state and prior private state (no configuration data), so it might not be avoidable regardless of whether it is part of the regular state or private state.

I do not think this feature request or implementation should necessarily do or suggest anything different for the protocol layer which would require core changes. The main goal here is to mimic the existing support so we can migrate existing terraform-plugin-sdk based providers to this framework without requiring breaking changes.

bflad avatar Jul 07 '22 23:07 bflad

Oh yeah, that's another interesting quirk of the timeouts design I hadn't considered! I ended up here because of the comment you left over in #62, which amusingly caused me to start pondering exactly what technical differences might apply to a "behavioral field" as compared to a normal attribute, beyond the one I already mentioned of potentially being usable in a hypothetical "tombstone" syntax to modify the behavior during destroy.

Getting the current value written in the configuration regardless of what operation we're running is another good concrete technical requirement for that -- I suppose it's a generalization of the "tombstone" requirement to the planning and refreshing operations too -- though I understand that the main point of this issue was to see if we could do something with timeouts without adding something new to the protocol, so I won't belabor that point here and I will instead wait to see what you learn from this tighter-scoped research!

apparentlymart avatar Jul 08 '22 00:07 apparentlymart

Support for timeouts, which can be migrated without practitioner facing changes from the terraform-plugin-sdk timeouts support, is now available as an external Go module: github.com/hashicorp/terraform-plugin-framework-timeouts

Additional references can be found at:

bflad avatar Oct 04 '22 15:10 bflad

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

github-actions[bot] avatar Nov 04 '22 02:11 github-actions[bot]