caddy icon indicating copy to clipboard operation
caddy copied to clipboard

Outdated TLS certificate on HTTP/3 requests after renewal

Open lutoma opened this issue 2 years ago • 2 comments

Caddy version: v2.5.1 h1:bAWwslD1jNeCzDa+jDCNwb8M3UJ2tPa8UZFFzPVmGKs=

We're using a dynamically generated JSON config that is changed from time to time using the API. It's all reverse proxies and redirects, and has one single server defined with experimental_http3 set to true.

This has worked fine for us for the last months, the initial ACME certificate gets served correctly over both HTTP/1/2/3 etc. However today we noticed some seemingly random certificate expiration issues. After some debugging it turns out that for HTTP 1/2, Caddy is returning a new certificate that was renewed last month, whereas for HTTP/3 it is still returning the original ACME certificate from 3 months ago that has now expired.

$ curl3 https://webmail.fnoco.eu -vvI
*   Trying 195.192.132.135:443...
* Connected to webmail.fnoco.eu (195.192.132.135) port 443 (#0)
[...]
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=webmail.fnoco.eu
*  start date: May 18 18:15:32 2022 GMT
*  expire date: Aug 16 18:15:31 2022 GMT
*  subjectAltName: host "webmail.fnoco.eu" matched cert's "webmail.fnoco.eu"
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify ok.
* Using HTTP2, server supports multiplexing

HTTP/3 requests still got served the old certificate that expired today:

$ curl3 https://webmail.fnoco.eu -vvI --http3
*   Trying 195.192.132.135:443...
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: none
* Connect socket 5 over QUIC to 195.192.132.135:443
* Sent QUIC client Initial, ALPN: h3,h3-29,h3-28,h3-27
* SSL certificate problem: certificate has expired
* connect to 195.192.132.135 port 443 failed: SSL peer certificate or SSH remote key was not OK
* Failed to connect to webmail.fnoco.eu port 443 after 177 ms: SSL peer certificate or SSH remote key was not OK
* Closing connection 0
curl: (60) SSL certificate problem: certificate has expired

(I couldn't find a way to get curl to spit out the start/expire dates of the expired certificate, but from some testing with Firefox I know they're 19 March 2022 / 17 June 2022, respectively).

I've dug through the Caddy logs and couldn't find anything unusual. Restarting Caddy seems to have fixed the issue and the correct certificate is now returned for HTTP/3, but I'm guessing we'll run into the same issue with some other host after the next renewal.

lutoma avatar Jun 21 '22 17:06 lutoma

/cc @marten-seemann is quic-go doing anything with TLS certs, caching them longer than it should or something? Caddy sets a TLSConfig with GetConfigForClient. Does quic-go honor that and call it for every new connection?

francislavoie avatar Jun 21 '22 18:06 francislavoie

It should. There's some logic to set the correct ALPN, maybe there's a bug in there? https://github.com/lucas-clemente/quic-go/blob/706a482340141f8edb26fefae993e96b1581b034/http3/server.go#L56-L90

marten-seemann avatar Jun 22 '22 10:06 marten-seemann

We had issues today using Caddy v2.6.1, where we'd sometimes get an old wildcard certificate on our production websites. Restarting caddy seems to have temporarily fixed the issue. We only observed the error on iOS devices, but that may be a coincidence.

The interesting part here is that the devices did not have http3 enabled. I'm not sure where to look for debug information, if you can point me in the right direction I'll gladly provide some info.

johanobergman avatar Jan 17 '23 19:01 johanobergman

@johanobergman Enable debug-level logs, that will be helpful, as it shows which certificates are being selected for a handshake. The more details about the certificate (when it was issued, precise SAN names without redactions, etc) the more we can help.

mholt avatar Jan 17 '23 21:01 mholt

@mholt I have enabled debug mode, but I guess we'll have to wait until the certificate expires again in order to reproduce the issue. Or do you have any suggestion on how to reproduce it earlier?

To add some more info: we have a lot of subdomains going to the same caddy server using the same wildcard certificate. The issue wasn't consistent between subdomains - some subdomains would work fine in the browser while for others the browser would complain about an expired certificate.

The DNS entry is also a wildcard.

johanobergman avatar Jan 19 '23 10:01 johanobergman

We only observed the error on iOS devices, but that may be a coincidence.

fwiw we also first thought this was an iOS issue when we ran into this. I believe this is because most (desktop) browsers retry using HTTP/1 when running into issues with HTTP/3, whereas API calls in iOS apps would just hard fail on invalid certificate.

lutoma avatar Jan 20 '23 12:01 lutoma

@johanobergman There's a parameter one can tune in their config, but it has to be with the JSON config (and we generally don't recommend setting it; except for troubleshooting maybe): https://caddyserver.com/docs/json/apps/tls/automation/policies/renewal_window_ratio/

The higher this value is (closer to 1), the younger the cert can be before Caddy will try renewing it. The lower the value is (closer to 0), the older the cert must be before Caddy will try renewing it.

Setting that value to 1 will cause all certificates to always be renewed, at every scan, for example. (Don't leave it like that!)

mholt avatar Jan 31 '23 20:01 mholt

quic-go has undergone quite a few optimizations and refactors lately; I'd be curious if anyone can test to see if this is still an issue with the latest commits on both Caddy and quic-go. (You'll need Go 1.19 or later with the latest commit.)

mholt avatar Feb 24 '23 20:02 mholt

I have upgraded Caddy to v2.6.4 using xcaddy and Go 1.20.2, will report back.

Or do I have to build using master with xcaddy?

johanobergman avatar Mar 08 '23 11:03 johanobergman

Thanks! The latest commit would be best; unreleased code includes recent patches: https://github.com/caddyserver/caddy/compare/v2.6.4...master

mholt avatar Mar 10 '23 19:03 mholt

I have experienced this as well on v2.6.4. Restarting Caddy seems to have resolved the issue

low613 avatar Apr 27 '23 23:04 low613

@low613 Can you test with the latest on master? (see comment above)

mholt avatar Apr 28 '23 15:04 mholt

Interestingly I was not able to replicate this on my local machine. It only occurs for me when running in a container in ECS through a network load balancer.

I built a new image for the master branch https://hub.docker.com/r/low613/caddy, and it looks to be resolved when running that

low613 avatar May 03 '23 01:05 low613

That is promising. So to clarify, the problem no longer occurs on your container in ECS through the network load balancer, when using the latest master? :+1:

mholt avatar May 03 '23 03:05 mholt

That is correct

low613 avatar May 03 '23 03:05 low613

Awesome -- thank you for verifying. I know that quic-go has undergone some significant refactors lately (with more to come), and our next release will use the latest quic-go, so this is quite likely resolved now. Closing, unless it can be confirmed to still be an issue (but even then, the issue should probably be taken upstream rather than here).

mholt avatar May 03 '23 16:05 mholt

The problem only happened to us after a whole cycle of certificate generation/expiration (3 months) without ever restarting Caddy. We're on v2.6.4 and not master, but we will upgrade when the next stable release comes out, and report back (3 months after that 😅).

johanobergman avatar May 03 '23 16:05 johanobergman

It's not actually solved in the master branch. As long as you don't reload caddy, h3 tls certificates will be up to date. I have found a fix.

WeidiDeng avatar May 04 '23 07:05 WeidiDeng

@low613 Did you reload caddy when using master? This bug only appears when caddy is reloaded.

Use xcaddy build fix-http3-after-reload to test the fix.

WeidiDeng avatar May 04 '23 07:05 WeidiDeng

Thanks for that, I was able to replicate the issue on master by reloading caddy. I tested the same steps on your branch and I get served with an up to date certificate.

low613 avatar May 04 '23 22:05 low613

Awesome, thanks everyone for working on that, especially @WeidiDeng for the patch!

mholt avatar May 10 '23 20:05 mholt

Great!

Any ETA for a stable release with this patch?

johanobergman avatar May 14 '23 16:05 johanobergman

@johanobergman We don't know when 2.7 stable will be released but the first beta should go out this week.

Tbh you could just use the CI artifacts from the last commit, our current HEAD is pretty stable as far as we know. Usually we get major bugs found and fixed pretty quickly -- I'd encourage you to use the latest HEAD and let us know if you encounter any issues before we make a stable release :100:

mholt avatar May 15 '23 18:05 mholt

Just experienced this bug today in production on Caddy 2.6.4, with many iOS users reporting TLS errors.

Restarting Caddy indeed fixed the issue!

We'll stay on the 2.6.4 for now (and keeping an eye on errors), thanks to those who worked on a fix for the 2.7 🙏

Pierre-Gilles avatar Jun 27 '23 15:06 Pierre-Gilles

@Pierre-Gilles Thanks for the info!

Are you sure you don't want to try the beta to verify? 😅

mholt avatar Jun 28 '23 05:06 mholt

@mholt I know beta are quite stable but it's for an enterprise project with many users, we cannot really try beta in production 😅

We may try it on our staging servers though!

Pierre-Gilles avatar Jul 05 '23 09:07 Pierre-Gilles

Yes, we'd absolutely love if you could verify on staging! That's a great place to start testing betas.

mholt avatar Jul 05 '23 16:07 mholt