terraform-provider-digitalocean
terraform-provider-digitalocean copied to clipboard
Retry 50x replies from API
Hi there,
The Digital Ocean API has shown flakiness for DNS requests in the last few months. As mentioned by the DO team in #203 that was intermittent.
However, API flake breaks our pipeline, even if just a few calls in 503 or 502s, I propose retrying 50x errors.
Terraform Version
Run terraform -v
to show the version. If you are not running the latest version of Terraform, please upgrade because your issue may have already been fixed.
Terraform v0.11.11
+ provider.digitalocean v1.1.0
Affected Resource(s)
Please list the resources as a list, for example:
digitalocean_record
Terraform Configuration Files
/ blindcut-vpn.com
resource "digitalocean_domain" "default" {
name = "example.com"
}
// Monitoring
resource "digitalocean_record" "test" {
domain = "${digitalocean_domain.default.name}"
type = "A"
name = "test"
value = "10.10.10.10"
}
Expected Behavior
Plan: x to add, x to change, x to destroy.
Do you want to perform these actions? Terraform will perform the actions described above. Only 'yes' will be accepted to approve.
Enter a value: yes
Actual Behavior
* digitalocean_record.test: 1 error(s) occurred:
* digitalocean_record.test: digitalocean_record.test: GET https://api.digitalocean.com/v2/domains/blindcut-vpn.com/records/xxxxxxxx: 500 Server was unable to give you a response```
### Steps to Reproduce
Please list the steps required to reproduce the issue, for example:
* Create about dns records and 1 domain
* Run terraform apply
### References
This was raised in #203 but it was closed.
Appreciate the context and apologies this causes issues in a pipeline—I can see how that would be annoying and totally empathize for you there.
Think adding retry logic to godo is where we need to start. https://github.com/digitalocean/godo/issues/173 is tracking it there.
Thanks @eddiezane!
Yesterday I was having similar problems.
Running into this problem today when migrating a client's DNS over to DO
Has anyone else on this thread seen this creep up lately?
ive reran terraform about 10 times today and cant get a successful run. ive started to receive a new error in the last few attempts
Error: Error retrieving domain: GET https://api.digitalocean.com/v2/domains/somedomain.com: 422 Invalid URL. Only valid hostname characters are allowed (a-z, A-Z, 0-9, ., _ and -).
@chasebolt apologies. Team identified an issue. That specific error is being resolved at the moment. Rolling restarts going out right now.
edit: https://status.digitalocean.com/incidents/szbtcxmzj5pd
I think it's important to retry API calls that fail for any reason, not just 5xx results. HTTP requests can fail for a variety of reasons. I need Terraform to be resilient to common issues:
- DNS lookup failure
- Connect timeout
- Connection rejected
- Connection reset
- Send timeout
- Recv timeout
- TLS negotiation failure
- Certificate validation failure
- No response from server
- Response not parsable as HTTP
- Unexpected redirect
- HTTP 404 Not Found & 405 Method Not Allowed responses (common with misconfigured proxies)
- HTTP 429 Too Many Requests (an alternative to 503s)
- HTTP 5xx response
How about changing the title to "Retry API calls" to account for these?
I've been getting a lot of TLS handshake errors over the last month. They seem to mainly relate to the tag API. Running deploy 2-5 times every time to update currently. Is this any closer?
Error: Error retrieving tag: Get https://api.digitalocean.com/v2/tags/node: net/http: TLS handshake timeout
on node-tag.tf line 2, in data "digitalocean_tag" "node":
2: data "digitalocean_tag" "node" {
Edit:
I'm not sure what is going on but when I 'sudo terraform apply' I have no issues with timeouts. I think I'm having a different issue. For other Mac users: it seems to be related to this issue: https://github.com/hashicorp/terraform/issues/15817
We've just released version 2.28.0 of this provider. It adds experimental support for automatically retrying requests that fail with connection errors, 429, or 500-level response codes. It can be enabled by setting the DIGITALOCEAN_HTTP_RETRY_MAX
environment variable or the http_retry_max
argument in the provider configuration.
Please let us know if you have any feedback on this functionality. We will be looking to enable it by default in a future release.
With the recently released version 2.30.0 of this provider, we have enabled retrying requests that fail with connection errors, 429, or 500-level response codes by default. Setting the DIGITALOCEAN_HTTP_RETRY_MAX
environment variable or the http_retry_max
argument in the provider configuration to 0
will disable this behavior.