documentation icon indicating copy to clipboard operation
documentation copied to clipboard

Domains on Pantheon Doc Update - Inclusion of warnings related to cert renewal failure.

Open ejcabquina opened this issue 1 year ago • 3 comments

Re: Domains on Pantheon

Priority: High

Issue Description: Automated re-validation failure for domains pointed to 3rd-party WAF.

Suggested Resolution:

  • Inclusion of warning notes for potential failure in renewal of certs.
  • Inclusion of workarounds for WAF settings that could block cert renewal from Let's Encrypt (Exemption for /.well-known/acme-challenge/* path)

ejcabquina avatar Jul 14 '24 13:07 ejcabquina

@stevector @rachelwhitton the TSC who reported indicated the issue priority to consider is High.

The issue is a race condition since LE will still try to renew (policy docs get/can impacted, including routing! ouch!!). This issue has impacted live production sites post-launch, even if Domain Validation gets revoked/exempted.

Can we tag this issue as 'Priority: High Priority'?

ccharlton avatar Sep 09 '24 15:09 ccharlton

@ccharlton @ejcabquina I'm good with trying to move fast on this issue. But I don't think @rachelwhitton or I have enough context to write the needed PR ourselves, even with the suggestions from @ejcabquina in the report. Are either of you able to draft the needed text and/or make a PR?

stevector avatar Sep 10 '24 16:09 stevector

Hi @stevector

not sure how best to communicate this but my ideas are mainly:

  • Add a "Warning/Consideration" tab under Domains on Pantheon
  • Include detailed warning in context of having a WAF inbetween the Platform and their DNS in context of this - https://letsencrypt.org/docs/faq/#what-ip-addresses-does-let-s-encrypt-use-to-validate-my-web-server and/or https://letsencrypt.org/2020/02/19/multi-perspective-validation
  • Which mainly tells customers that "LE certificate renewal could potentially fail if your WAF blocks Let's Encrypt's Multi-Perspective Validation" causing downtime due to the platform infra taking out domains associated to a site instance when LE certificate expires.

There's a platform gap here where we as a platform actually don't seem to have a something in place that detects this + send out email to notify customers. (or maybe we do detect this but I know for sure we're not sending out notification specific to this scenario.)

Adding this bug card here for +context - https://getpantheon.atlassian.net/issues/BUGS-8403?jql=ORDER%20BY%20created%20DESC

ejcabquina avatar Sep 10 '24 22:09 ejcabquina