dnscontrol icon indicating copy to clipboard operation
dnscontrol copied to clipboard

Porkbun does not work concurrently (`ppreview`)

Open Syntaxheld opened this issue 1 year ago • 15 comments

Describe the bug Testing the new concurrency feature (ppreview) on domains handled by the Porkbun provider it fails to handle those:

INFO#1: Zone "example.com" does not exist. Can not create because "porkbun" does not implement ZoneCreator

But the zone exists ... all works well when running the normal preview.

To Reproduce

  1. Have (multiple) domains in dnscontrol setup that use Porkbun
  2. dnscontrol ppreview --cmode all
  3. Multiple Zone "example.com" does not exist. Can not create because "porkbun" does not implement ZoneCreator issues.

Expected behavior For Porkbun to work concurrently.

DNS Provider

  • Porkbun

Additional context Tested with dnscontrol v4.11.0.

I've checked the "Concurrency Verified" column in https://docs.dnscontrol.org/getting-started/providers ... but I was not sure if the ❌ means "verified & failed" or "not verified".

At least in my testing, when I use dnscontrol ppreview --cmode all --domains example1.com,example2.com it still works (most of the time). It only starts throwing issues regularly when I test it with at least 3 different zones.

CC @imlonghao (maintainer of the porkbun provider)

Syntaxheld avatar Jun 07 '24 22:06 Syntaxheld

I was not sure if the ❌ means "verified & failed" or "not verified".

For Porkbun, it means "not verified" yet.

Zone "example.com" does not exist.

This error was caused by a missing error catch for JSON parsing, the real error message should be the follow, I will fill a PR for this.

failed parsing nameserver list from porkbun: invalid character '<' looking for beginning of value

https://github.com/StackExchange/dnscontrol/blob/32fe2753d81a681156dae5453f3b3724f30d70aa/providers/porkbun/api.go#L157

I'm not sure whether they have a hidden rate limit or something, in my test, when there are more than 3 domains in concurrently, an HTTP 503 error was thrown.

<html>
<head><title>503 Service Temporarily Unavailable</title></head>
<body>
<center><h1>503 Service Temporarily Unavailable</h1></center>
<hr><center>openresty</center>
</body>
</html>

I will contact the support for this, asking whether they have a rate limit or just the server crashed.

imlonghao avatar Jun 08 '24 01:06 imlonghao

I got a response from Porkbun support.

The rate limit is 1 request per second.

So the Porkbun provider is not supposed to work concurrently.

imlonghao avatar Jun 15 '24 03:06 imlonghao

1 per second? wow, that's pretty low.

It might be useful to implement a re-try on 429 (if porkbun sends a proper 429 when the rate limit is exceeded). Even without concurrency it is possible that we would exceed that limit.

tlimoncelli avatar Jun 18 '24 12:06 tlimoncelli

Any chance this maybe works now, since the split to a separate API host?

Quoting from https://github.com/StackExchange/dnscontrol/pull/3153#pullrequestreview-2363906014

No more AWSALB, nor the weird rate limit. I run the integration test and all tests success all at once, not hitting any rate limit compare to https://github.com/StackExchange/dnscontrol/pull/3019.

Syntaxheld avatar Feb 04 '25 17:02 Syntaxheld

I don't have a porkbun account, but if you do, you can test by adding the --cmode all flag. It will run all providers concurrently, even if they're set to providers.CanConcur: providers.Cannot().

tlimoncelli avatar Feb 04 '25 22:02 tlimoncelli

Looks like I still get rate limited by Porkbun with --cmode all.

jamesog avatar Feb 04 '25 22:02 jamesog

I thought you might. Did dnscontrol pause and retry or fail?

tlimoncelli avatar Feb 04 '25 22:02 tlimoncelli

CONCURRENTLY gathering 13 zone(s)
SERIALLY gathering 0 zone(s)
Waiting for concurrent gathering(s) to complete...WARNING: Rate limiting.. waiting for 10 second(s)
WARNING: Rate limiting.. waiting for 10 second(s)
WARNING: Rate limiting.. waiting for 10 second(s)
DONE

With some, but not all domains outputting

INFO#1: Domain "jamesog.dev" provider porkbun Error: failed parsing url forwarding record list from porkbun: invalid character '<' looking for beginning of value

(They return HTML when they 429.)

jamesog avatar Feb 04 '25 22:02 jamesog

Sounds like the provider needs to handle 429s to make this work.

tlimoncelli avatar Feb 05 '25 17:02 tlimoncelli

I've played about with the retry logic a bit, including adding explicit support for 429, but the Porkbun API still has pretty aggressive restrictions.

Even if I add longish sleeps between each retry, I intermittently (and non-deterministically, AFAICT) get either 503 or 400 when running in concurrent mode. In both cases it returns HTML, despite their API docs saying:

Any HTTP response code other than 200 is an error. Errors will also be indicated in the "status" element of the returned JSON. An HTTP response of 403 means that additional authentication information is required, such as a two factor code.

I think for now Porkbun will have to be left out of concurrency :-(

(The provider could still do with handling non-JSON responses better, though.)

jamesog avatar Feb 05 '25 21:02 jamesog

@jamesog Thanks for investigating this!

We've seen similar problems with other providers. For example, gcloud's retry.go has a special case where it will retry a 502 exactly once. It's a bit hacky, but it works.

I think it would be worthwhile to add support for 429 even for non-concurrent use.

tlimoncelli avatar Feb 06 '25 13:02 tlimoncelli

thanks for all the details in this thread! I think i managed to grok the Concurrency Verified column in the provider matrix now... but not well enough to write a doc page for it.

Any of you able to write a concise definition for that column?

gotjoshua avatar Mar 25 '25 08:03 gotjoshua

Hi @gotjoshua ! Thanks for looking into this.

"Concurrency Verified": A checkmark means "this has been tested and will run concurrently". The red "X" means "it hasn't been tested, we don't know if it will work concurrently".

Here's how to perform a test:

  1. Build dnscontrol with the -race flag: go build -race (this makes a special binary)
  2. Run this special binary: dnscontrol preview with a configuration that lists at least 4 domains using Porkbun. More domains is better.
  3. The special binary runs much slower (1/10th the speed) because it is doing a lot of checking. If it reports problems, the error message will indicate where the problem is. It might not be 100% accurate, but it will be in the right area.

I would be glad to help fix any problems!

Tom

tlimoncelli avatar Mar 25 '25 19:03 tlimoncelli

~~(@tlimoncelli You mean ppreview, right?)~~ Nope, ppreview was shipped last year.

winterqt avatar Mar 28 '25 22:03 winterqt

The red "X" means "it hasn't been tested, we don't know if it will work concurrently".

why not use the ? for that and the red x for tested and not working?

started a PR draft with your suggestion @tlimoncelli : https://github.com/StackExchange/dnscontrol/pull/3510

gotjoshua avatar Mar 29 '25 11:03 gotjoshua

Hi all, I've been doing some more testing with this and I think I have a solution. @tlimoncelli are you open to adding a new external dependency? I've recently been using Failsafe Go in other projects, and using it here works perfectly.

The diff to make this work with Porkbun is pretty straightforward: you configure a retry policy, and use this as the transport on the HTTP client:

	retryPolicy := failsafehttp.RetryPolicyBuilder().
		WithMaxRetries(5).
		WithBackoff(time.Second, 10*time.Second).
		OnRetry(func(f failsafe.ExecutionEvent[*http.Response]) {
			fmt.Printf("[%s] retry attempt %d\n", endpoint, f.Retries())
		}).
		OnRetriesExceeded(func(f failsafe.ExecutionEvent[*http.Response]) {
			fmt.Println(endpoint, ": all retries exceeded")
		}).
		Build()

	client := &http.Client{
		Transport: failsafehttp.NewRoundTripper(nil, retryPolicy),
	}

Here you can see I added some debug prints to prove that it's working, but those OnRetry calls can be removed.

failsafehttp's RetryPolicy knows how to handle the Retry-After response header, which I think is the key here.

With this we can remove a chunk of code while also making it more robust:

diff --git providers/porkbun/api.go providers/porkbun/api.go
index 3c07b72e..655420f3 100644
--- providers/porkbun/api.go
+++ providers/porkbun/api.go
@@ -11,7 +11,8 @@ import (
 	"strings"
 	"time"
 
-	"github.com/StackExchange/dnscontrol/v4/pkg/printer"
+	"github.com/failsafe-go/failsafe-go/failsafehttp"
+	"github.com/failsafe-go/failsafe-go/retrypolicy"
 )
 
 const (
@@ -70,32 +71,26 @@ func (c *porkbunProvider) post(endpoint string, params requestParams) ([]byte, e
 		return []byte{}, err
 	}
 
-	client := &http.Client{}
+	retryPolicy := failsafehttp.RetryPolicyBuilder().
+		WithMaxRetries(5).
+		WithBackoff(time.Second, 10*time.Second).
+		Build()
+
+	client := &http.Client{
+		Transport: failsafehttp.NewRoundTripper(nil, retryPolicy),
+	}
 	req, _ := http.NewRequest(http.MethodPost, baseURL+endpoint, bytes.NewBuffer(personJSON))
 
-	retrycnt := 0
-
-	// If request sending too fast, the server will fail with the following error:
-	// porkbun API error: Create error: We were unable to create the DNS record.
-retry:
-	time.Sleep(time.Second)
 	resp, err := client.Do(req)
 	if err != nil {
+		if errors.Is(err, retrypolicy.ErrExceeded) {
+			return nil, errors.New("rate limiting exceeded")
+		}
 		return []byte{}, err
 	}
 
 	bodyString, _ := io.ReadAll(resp.Body)
 
-	if resp.StatusCode == http.StatusAccepted || resp.StatusCode == http.StatusServiceUnavailable {
-		retrycnt++
-		if retrycnt == 5 {
-			return bodyString, errors.New("rate limiting exceeded")
-		}
-		printer.Warnf("Rate limiting.. waiting for %d second(s)\n", retrycnt*10)
-		time.Sleep(time.Second * time.Duration(retrycnt*10))
-		goto retry
-	}
-
 	// Got error from API ?
 	var errResp errorResponse
 	err = json.Unmarshal(bodyString, &errResp)
diff --git providers/porkbun/porkbunProvider.go providers/porkbun/porkbunProvider.go
index e130d31d..3a4bbf59 100644
--- providers/porkbun/porkbunProvider.go
+++ providers/porkbun/porkbunProvider.go
@@ -59,7 +59,7 @@ var features = providers.DocumentationNotes{
 	// See providers/capabilities.go for the entire list of capabilities.
 	providers.CanAutoDNSSEC:          providers.Cannot(),
 	providers.CanGetZones:            providers.Can(),
-	providers.CanConcur:              providers.Unimplemented(),
+	providers.CanConcur:              providers.Can(),
 	providers.CanUseAlias:            providers.Can(),
 	providers.CanUseCAA:              providers.Can(),
 	providers.CanUseDS:               providers.Cannot(),

Potentially we could re-added somelike the printer.Warnf("Rate limiting.. waiting for %d second(s)\n", retrycnt*10) into the OnRetry if that output is desired.

If you're happy with having this new dependency I'll open a PR.

jamesog avatar Jul 07 '25 10:07 jamesog

failsafehttp's RetryPolicy knows how to handle the Retry-After response header, which I think is the key here.

Actually I think I'm wrong here. In further testing when it fails I'm usually seeing 503s, but the retry policy is handling it beautifully.

With this change

		OnRetryScheduled(func(f failsafe.ExecutionScheduledEvent[*http.Response]) {
			printer.Warnf("Rate limiting... waiting for %s\n", f.Delay)
			printer.Debugf("Last result: status=%d headers=%v\n", f.LastResult().StatusCode, f.LastResult().Header)
		}).

I see

CONCURRENTLY gathering 15 zone(s)
SERIALLY gathering 0 zone(s)
Waiting for concurrent gathering(s) to complete...WARNING: Rate limiting... waiting for 1s
Last result: status=503 headers=map[Content-Length:[194] Content-Type:[text/html] Date:[Mon, 07 Jul 2025 11:37:20 GMT] Server:[openresty] Strict-Transport-Security:[max-age=63072000; includeSubDomains; preload] X-Frame-Options:[deny]]
WARNING: Rate limiting... waiting for 1s
Last result: status=503 headers=map[Content-Length:[194] Content-Type:[text/html] Date:[Mon, 07 Jul 2025 11:37:21 GMT] Server:[openresty] Strict-Transport-Security:[max-age=63072000; includeSubDomains; preload] X-Frame-Options:[deny]]
WARNING: Rate limiting... waiting for 2s
Last result: status=503 headers=map[Content-Length:[194] Content-Type:[text/html] Date:[Mon, 07 Jul 2025 11:37:21 GMT] Server:[openresty] Strict-Transport-Security:[max-age=63072000; includeSubDomains; preload] X-Frame-Options:[deny]]
DONE

but it ultimately succeeds, every time.

jamesog avatar Jul 07 '25 11:07 jamesog

Hi all, I've been doing some more testing with this and I think I have a solution. @tlimoncelli are you open to adding a new external dependency? I've recently been using Failsafe Go in other projects, and using it here works perfectly.

Yes, I'm open to adding this dependency.

I haven't heard of it before, but I just skimmed the docs and it looks like a good fit for our needs. If it works well for you, I'll recommend it for all providers.

tlimoncelli avatar Jul 08 '25 12:07 tlimoncelli

Brilliant, thanks. I've opened #3652.

I haven't heard of it before, but I just skimmed the docs and it looks like a good fit for our needs. If it works well for you, I'll recommend it for all providers.

Working on this got me thinking that it could be nice to have a standardised way of creating an HTTP client and making calls. It looks like each provider is doing its own thing today, including whether a User-Agent header is set (largely not) and how retry handling is implemented.

jamesog avatar Jul 08 '25 13:07 jamesog