terraform-provider-dns icon indicating copy to clipboard operation
terraform-provider-dns copied to clipboard

TF 0.12 Provider produced inconsistent result

Open tam116 opened this issue 5 years ago • 6 comments

Terraform Version

0.12.6

Affected Resource(s)

dns_a_record_set dns_cname_record (probably other but didn't test)

Terraform Configuration Files

Configure the DNS Provider

provider "dns" { update { server = "" key_name = "" key_algorithm = "hmac-sha256" key_secret = "" } }

resource "dns_a_record_set" "foo" { zone = "" name = "tftest" addresses = [""] ttl = 300 }

Debug Output

https://gist.github.com/tam116/26e65bee30e67569929beb1e9f45ab45

Expected Behavior

DNS records created and results returned.

Actual Behavior

DNS records are created successfully (i.e nslookup shows the correct record) but an error message is printed stating: "Provider produced inconsistent result after apply: When applying changes to dns_a_record_set.foo, provider "dns" produced an unexpected new value for was present, but now absent. This is a bug in the provider, which should be reported in the provider's own issue tracker." and the .tfstate file is unmodified.

A second run of terraform apply results in the expected output and updates the .tfstate as expected. Updating or destroying existing records produces expected results.

Steps to Reproduce

  1. terraform apply

tam116 avatar Aug 06 '19 17:08 tam116

Thanks for reporting this, @tam116!

This ungrammatical error arises if, for some reason, a provider responds to the request to perform a Create by returning a null value indicating that the object doesn't exist. The provider is implemented to try reading back what it just wrote immediately after it created it, so it seems possible that this could arise if there is some delay before the new DNS record is available for reading on the remote DNS server.

Is there something about your particular DNS server that might cause that to be true? For example, is it actually a cluster of servers and writing to one of them asynchronously replicates to the others, and so there might be some delay before the new record becomes available?

In some other providers there is logic to retry for a while until the new object seems to be consistently available, as a heuristic to try to work around this. Such an approach might be warranted here too, but if possible I'd like to understand more about the circumstances, since that technique would be appropriate only if the new records were to become consistently available within a couple minutes at most; we might need to consider a different approach if e.g. it could take an hour or more for the new record to become consistently available, due to the servers honoring TTLs on the records in spite of updates.

apparentlymart avatar Aug 06 '19 21:08 apparentlymart

Thanks for looking into this so quickly @apparentlymart!

Yes, I checked with our network group about how DNS is setup. DNS servers are a single primary with multiple secondaries. DNS is load balanced across all of the servers. The secondaries don't honor TTL, they'll reply with the updated record as soon as they are notified by the master.

After checking with the network engineers, I retested against the master IP and the apply succeeded.

When I use the load balanced IP, which is what I did for the original report, running terraform apply; sleep 1; nslookup shows me the new record was added (terraform still reports an error). If I remove the sleep 1, then nslookup fails as well. So for me, changes are propagated in about 1 second.

I think your suggestion of adding a retry with a 60s timeout makes sense.

tam116 avatar Aug 07 '19 14:08 tam116

Thanks for that extra context, @tam116!

The Terraform team at HashiCorp won't be able to work on this in the near future due to our focus being elsewhere, but we'd be happy to review a pull request if you or someone else has the time and motivation to implement it. Alternatively, if others would also like to see this implemented I'd encourage adding a :+1: upvote reaction to the original issue comment (not to this comment), which we use as one of the inputs to prioritize work for the Terraform team.

Unfortunately our existing retry patterns may be a bit too HTTP-centric for direct use here, but the low-level Retry function could work. The fact that future requests could go to either the primary or the secondaries makes this tricky, cause the retry might coincidentally see a response from the master and think it's converged but then some other client downstream in Terraform (which is relying on that new record) might see it as not yet present. However, even a simple retry would at least address the case where the resource is just created and not immediately used elsewhere in Terraform, so would probably be worth doing regardless of the limitations.

apparentlymart avatar Aug 07 '19 14:08 apparentlymart

@tam116 Did you tried with parameter transport setted to "tcp" inside provider declaration ? (instead of udp by default)

I my case I had the similar error message, but by checking on the DNS, the record was correctly created. After investigation of network packets (by using WireShark), during the apply, we detected the first UDP request was sent to the DNS to request the creation. And a second UDP packet was sent to the DNS to request/ask if the record exists. In my case the DNS replied "unknow host" even if it was correctly created... We did not understand why (DNS propagation time...?), but the TCP solved my issue. Maybe TCP force to be synchronize and to correctly wait the DNS creation/propagation?

kdefives avatar Feb 14 '20 16:02 kdefives

We are having the same issue. I tested with TCP transport but that didn't seem to have any affect.

TeroPihlaja avatar Feb 25 '20 09:02 TeroPihlaja

We are having the same issue. I tested with TCP transport but that didn't seem to have any affect.

Did you find any solution for that issue ? I am facing the same problem :/

kdefives avatar Jun 10 '20 14:06 kdefives