terraform-plugin-sdk
terraform-plugin-sdk copied to clipboard
Returning an error still applies diff to state.
SDK version
present in v1.13.1, not tested in previous versions, still need to determine when the bug was introduced, or if it was intended behavior that wasn't communicated clearly
Relevant provider source code
func Update(d *schema.ResourceData, meta interface{}) error {
return fmt.Errorf("error")
}
Terraform Configuration Files
For create:
resource "foo_resource" "testing" {
foo = 123
}
For update:
resource "foo_resource" "testing" {
foo = 456
}
Debug Output
To be added.
Expected Behavior
On update, because an error is returned, the diff should not be applied to state, so foo
should still be 123
, as the error indicates that the diff was not used.
Actual Behavior
On update, foo
is set to 456
in state, even though the error was correctly returned, indicating that the diff was applied anyways.
Steps to Reproduce
-
terraform apply
- Modiy
foo
to456
in config file -
terraform apply
-
terraform show foo_resource.testing.foo
Workarounds
Modifying your code to this correctly discards the diff in the case of error, not applying it to state.
func Update(d *schema.ResourceData, meta interface{}) error {
if err != nil {
d.Partial(true)
return err
}
}
Impact
Most providers shouldn't notice this. The apply will be ended when the error is encountered, so no downstream resources are interpolating incorrect results on the run that errored. And before the next apply, refresh
should run, updating the information in state to match reality. This should therefore only impact resources that don't or can't update all their fields in Get
, or users that run with -refresh=false
, which is discouraged.
Just to be sure I get this sentence part: This should therefore only impact resources that don't or can't update all their fields in Get
I may be confused due to the Get
word and/or even due to some lack of knowledge/vocabulary so:
Does it means that it should impact resource fields that haven't any Set
call on it, in resource Read (or maybe Update) method ?
(it means to me that the field is just not read by the provider and it's value lives between the TF file and the state only)
hashicorp/terraform#20295 seems related to this. Still tracking down why this is happening and whether it's happening on purpose or not. There seems to be some ambiguity in the Terraform Resource Instance Change Lifecycle doc as to whether the constraints on what state can be returned from ApplyResourceChange
apply in the face of an error. If they do, I think we're going to run into trouble with:
Any attribute that had a known value in the planned new state must have an identical value in the new state.
I don't see how that could yield a correct state in the face of an error. Research is still ongoing though.
Just to be sure I get this sentence part: This should therefore only impact resources that don't or can't update all their fields in Get I may be confused due to the Get word and/or even due to some lack of knowledge/vocabulary so:
Does it means that it should impact resource fields that haven't any Set call on it, in resource Read (or maybe Update) method ? (it means to me that the field is just not read by the provider and it's value lives between the TF file and the state only)
Sorry, @treywelsh, this one's on me. I meant Read
about. Basically, if when you retrieve the state from the API (usually done in the Read
function associated with your resource: https://pkg.go.dev/github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema#Resource.ReadContext) then it should overwrite the values from the apply that failed. The thinking is this:
- The apply fails. The wrong state gets written. (The state that says the apply succeeded).
- The user runs apply again. The state gets refreshed (Read gets called) and the state values are updated based on what the API reports. This brings state back in line with reality before the plan is generated, meaning the state that pretended everything worked never actually had any user impact.
Number 2 holds up if and only if users aren't running with the -refresh=false
flag (which is not recommended) and if all the fields in your state get updated during Read. If some fields aren't available in the API response and can't be updated, then they'll continue to have the wrong values.
Just to add an obvious use case to the partial
mode, I have a custom resource that includes a password. During the update, when we detect a change in that password, we update it through a dedicated call in our API through our custom plugin. If that call fails, we need to keep the state as it is right now because that is what is live on the server.
That particular value cannot be refreshed in the Read
part because it's obviously not returned on the api (and if it were it would be a hash anyway).
So, without particular handling, we can end up with a wrong password in the statefile compared to the actual resource and it can lead to serious problems.
We've hit this issue in NSXT provider in several scenarios:
- Validation function fails for update of resource, after which the state is updated with the wrong attribute value (same one that caused validation to fail)
- Update function is non-existent, but state gets updated anyway
- Provider update function returns an error
Seems like can only use the workaround suggested above d.Partial(true)
for the third case, since in the first two as a provider we don't have a say. I was wondering if there could be another workaround?
Additional question - we don't see this behavior happening for all resources, some maintain correct state after seemingly same kind of error. It would help us a lot to get some understanding what could cause this difference.
Thanks in advance!