acme-dns-01-cloudflare icon indicating copy to clipboard operation
acme-dns-01-cloudflare copied to clipboard

Propagation delay issues

Open kentonv opened this issue 4 years ago • 5 comments

Thanks for writing this plugin! I've used it successfully (along with ACME.js in general) in https://github.com/sandstorm-io/sandstorm/pull/3299.

I had some thoughts on propagation verification.

So, currently, if I pass verifyPropagation: true to the options, the plugin will block until it can actually see the DNS entry show up. That's a nice feature! But it struck me that it's really only telling you if the DNS entry has shown up in the Cloudflare colo closest to you, which is probably not the colo Let's Encrypt itself will be querying. So in practice it's still an estimate, and it's not clear to me if it's all that much better than waiting for a fixed delay period.

A more subtle problem with the current logic is that the plugin makes the first verification query right away after writing the DNS entry to the Cloudflare API. This might paradoxically cause verification to take longer, because this first query will almost always fail, and this negative result will then be cached (perhaps in the local machine's DNS cache, perhaps in caches on Cloudflare's side, etc.). So, the DNS entry will likely arrive at the colo a few seconds later, but the plugin may keep seeing it as missing for a while due to caching, causing verification to take longer than it needs to.

Meanwhile it looks like ACME.js does a verification step of its own. It expects the plugin to have a property called propagationDelay specifying how much time to wait before checking if the DNS entry exists. The Cloudflare plugin doesn't appear to set this property currently, which cases ACME.js to print a warning and then use 5 seconds as a default. After the delay, ACME.js does its own DNS lookup and bails out immediately if it can't find the DNS entry. In my experimentation, it seems that 5 seconds is not often long enough for Cloudflare, so when verifyPropagation is disabled, the plugin commonly fails.

With all that in mind, some specific ideas for improvements:

  • Perhaps acme-dns-01-cloudflare should set propagationDelay to something like 15 seconds when verifyPropagation is off, and zero when it is on. That'll silence the warning and should provide reasonable behavior either way. I am currently doing exactly this as a hack in Sandstorm: https://github.com/sandstorm-io/sandstorm/pull/3299/commits/a93192040b0ef2dfd6ef872a592c52a52e768f65

  • When verifyPropagation is on, wait 5-10 seconds before the first verification attempt. This should make verification take less time overall since it won't pollute caches in the common case that propagation happens quickly.

  • When verifyPropagation is on, the stderr output is pretty verbose and makes it look like things are going horribly wrong when in fact it's working as intended. Specifically right now it repeatedly prints things like:

      Error: Could not verify DNS for _acme-challenge.zero.kentonshouse.com
          at Function.verifyPropagation (node_modules/acme-dns-01-cloudflare/index.js:146:12)
          at Challenge.set (node_modules/acme-dns-01-cloudflare/index.js:46:5)
      Waiting for 10000 ms before attempting propagation verification retry 1 / 30.
    

    This could maybe be reduced to a single line like: "DNS not propagated yet. Will check again in 10 seconds. (attempt 1/30)"

  • I noticed that even when verifyPropagation is off, the plugin seems to verify deletions. But deletions only happen as the very last step of the ACME.js process and there's really no need to verify them, so this just adds a lot of stderr spam for no reason. Maybe verification of deletions should be skipped?

kentonv avatar Apr 26 '20 17:04 kentonv