terraform-provider-digitalocean icon indicating copy to clipboard operation
terraform-provider-digitalocean copied to clipboard

Retry 50x replies from API

Open ojongerius opened this issue 5 years ago • 9 comments

Hi there,

The Digital Ocean API has shown flakiness for DNS requests in the last few months. As mentioned by the DO team in #203 that was intermittent.

However, API flake breaks our pipeline, even if just a few calls in 503 or 502s, I propose retrying 50x errors.

Terraform Version

Run terraform -v to show the version. If you are not running the latest version of Terraform, please upgrade because your issue may have already been fixed.

Terraform v0.11.11
+ provider.digitalocean v1.1.0

Affected Resource(s)

Please list the resources as a list, for example:

digitalocean_record

Terraform Configuration Files

/ blindcut-vpn.com
resource "digitalocean_domain" "default" {
  name         = "example.com"
}

// Monitoring
resource "digitalocean_record" "test" {
  domain = "${digitalocean_domain.default.name}"
  type   = "A"
  name   = "test"
  value  = "10.10.10.10"
}

Expected Behavior

Plan: x to add, x to change, x to destroy.

Do you want to perform these actions? Terraform will perform the actions described above. Only 'yes' will be accepted to approve.

Enter a value: yes

Actual Behavior


* digitalocean_record.test: 1 error(s) occurred:

* digitalocean_record.test: digitalocean_record.test: GET https://api.digitalocean.com/v2/domains/blindcut-vpn.com/records/xxxxxxxx: 500 Server was unable to give you a response```

### Steps to Reproduce
Please list the steps required to reproduce the issue, for example:

* Create about dns records and 1 domain
 * Run terraform apply

### References
This was raised in #203 but it was closed.

ojongerius avatar May 14 '19 23:05 ojongerius

Appreciate the context and apologies this causes issues in a pipeline—I can see how that would be annoying and totally empathize for you there.

Think adding retry logic to godo is where we need to start. https://github.com/digitalocean/godo/issues/173 is tracking it there.

eddiezane avatar May 16 '19 00:05 eddiezane

Thanks @eddiezane!

ojongerius avatar May 16 '19 01:05 ojongerius

Yesterday I was having similar problems.

pjanuario avatar Jun 05 '19 14:06 pjanuario

Running into this problem today when migrating a client's DNS over to DO

chasebolt avatar Jan 28 '20 18:01 chasebolt

Has anyone else on this thread seen this creep up lately?

eddiezane avatar Jan 28 '20 18:01 eddiezane

ive reran terraform about 10 times today and cant get a successful run. ive started to receive a new error in the last few attempts

Error: Error retrieving domain: GET https://api.digitalocean.com/v2/domains/somedomain.com: 422 Invalid URL. Only valid hostname characters are allowed (a-z, A-Z, 0-9, ., _ and -).

chasebolt avatar Jan 28 '20 19:01 chasebolt

@chasebolt apologies. Team identified an issue. That specific error is being resolved at the moment. Rolling restarts going out right now.

edit: https://status.digitalocean.com/incidents/szbtcxmzj5pd

eddiezane avatar Jan 28 '20 20:01 eddiezane

I think it's important to retry API calls that fail for any reason, not just 5xx results. HTTP requests can fail for a variety of reasons. I need Terraform to be resilient to common issues:

  • DNS lookup failure
  • Connect timeout
  • Connection rejected
  • Connection reset
  • Send timeout
  • Recv timeout
  • TLS negotiation failure
  • Certificate validation failure
  • No response from server
  • Response not parsable as HTTP
  • Unexpected redirect
  • HTTP 404 Not Found & 405 Method Not Allowed responses (common with misconfigured proxies)
  • HTTP 429 Too Many Requests (an alternative to 503s)
  • HTTP 5xx response

How about changing the title to "Retry API calls" to account for these?

mleonhard avatar Feb 26 '20 21:02 mleonhard

I've been getting a lot of TLS handshake errors over the last month. They seem to mainly relate to the tag API. Running deploy 2-5 times every time to update currently. Is this any closer?

Error: Error retrieving tag: Get https://api.digitalocean.com/v2/tags/node: net/http: TLS handshake timeout

  on node-tag.tf line 2, in data "digitalocean_tag" "node":
   2: data "digitalocean_tag" "node" {

Edit:

I'm not sure what is going on but when I 'sudo terraform apply' I have no issues with timeouts. I think I'm having a different issue. For other Mac users: it seems to be related to this issue: https://github.com/hashicorp/terraform/issues/15817

troywilson avatar Jul 08 '20 01:07 troywilson

We've just released version 2.28.0 of this provider. It adds experimental support for automatically retrying requests that fail with connection errors, 429, or 500-level response codes. It can be enabled by setting the DIGITALOCEAN_HTTP_RETRY_MAX environment variable or the http_retry_max argument in the provider configuration.

Please let us know if you have any feedback on this functionality. We will be looking to enable it by default in a future release.

andrewsomething avatar Apr 21 '23 16:04 andrewsomething

With the recently released version 2.30.0 of this provider, we have enabled retrying requests that fail with connection errors, 429, or 500-level response codes by default. Setting the DIGITALOCEAN_HTTP_RETRY_MAX environment variable or the http_retry_max argument in the provider configuration to 0 will disable this behavior.

andrewsomething avatar Sep 11 '23 19:09 andrewsomething