tls: certificates auto renewal will become stuck if issuer is changed between config reloads
When config is reloaded with a changed acme issuer, certmagic will check for the existence of the certificate of created from the new issuer next time certificate should be issued. These certificates don't exist because we want to use them to be created in the first place. certmagic will try in vain for 30 days to renew these certificates.
Detailed explanation:
When caddy is starting, a global tls cache is created if needed
https://github.com/caddyserver/caddy/blob/a1751adb40fbd2369c044a7fe16ab17e0fdce334/modules/caddytls/tls.go#L158-L164
it will be destroyed if tls is not used anymore
https://github.com/caddyserver/caddy/blob/a1751adb40fbd2369c044a7fe16ab17e0fdce334/modules/caddytls/tls.go#L424-L430
TLS cache will start renewing certificates in the background
https://github.com/caddyserver/certmagic/blob/3fcd710c0cfc6d80026011c8ef9b0d7e94860b2b/cache.go#L127
Managed domains are updated through caddy configuration.
Eventually, renewal will be done here
https://github.com/caddyserver/certmagic/blob/3fcd710c0cfc6d80026011c8ef9b0d7e94860b2b/maintain.go#L235
TLS cache will try to renew the certificate using the latest issuer url, but first it will check the existence of the old certificate:
https://github.com/caddyserver/certmagic/blob/3fcd710c0cfc6d80026011c8ef9b0d7e94860b2b/config.go#L807-L812
It doesn't exist because the old certificate if from a different issuer and the path checked is from the latest issuer.
This will be retried here
https://github.com/caddyserver/certmagic/blob/3fcd710c0cfc6d80026011c8ef9b0d7e94860b2b/config.go#L982
There are at least two ways to fix this: to restart caddy or remove the active caddy configuration and reload it so that caddy will realize these certificates don't exist and should be created instead.
Thanks for the report. I know you did in Slack, but could you share your logs here too? For the record, so as I go to fix I can ensure that the proper code paths are recreated and I fix the right problem. :)
This is the screenshort shat shows the stuck job:
The log is as
Nov 12 16:37:54 linux caddy[455]: {"level":"warn","ts":1731400674.037309,"logger":"tls.cache.maintenance","msg":"error while checking if stored certificate is also expiring soon","identifiers":["example.com"],"error":"open /tmp/caddy/certificates/new-acme/example.com/example.com.key: no such file or directory"}
Nov 12 16:37:54 linux caddy[455]: {"level":"info","ts":1731400674.0373702,"logger":"tls.cache.maintenance","msg":"certificate expires soon; queuing for renewal","identifiers":["example.com"],"remaining":1956125.962630629}
Nov 12 16:37:54 linux caddy[455]: {"level":"info","ts":1731400674.0377727,"logger":"tls.cache.maintenance","msg":"attempting certificate renewal","identifiers":["example.com"],"remaining":1956125.962229608}
Nov 12 16:37:54 linux caddy[455]: {"level":"info","ts":1731400674.1341906,"logger":"tls.renew","msg":"acquiring lock","identifier":"example.com"}
Nov 12 16:37:54 linux caddy[455]: {"level":"info","ts":1731400674.1503463,"logger":"tls.renew","msg":"lock acquired","identifier":"example.com"}
Nov 12 16:37:54 linux caddy[455]: {"level":"error","ts":1731400674.1505845,"logger":"tls.renew","msg":"will retry","error":"open /tmp/caddy/certificates/new-acme/example.com/example.com.key: no such file or directory","attempt":1,"retrying_in":60,"elapsed":0.000176754,"max_duration":2592000}
Nov 12 16:38:54 linux caddy[455]: {"level":"error","ts":1731400734.152456,"logger":"tls.renew","msg":"will retry","error":"open /tmp/caddy/certificates/new-acme/example.com/example.com.key: no such file or directory","attempt":2,"retrying_in":120,"elapsed":60.002046754,"max_duration":2592000}
Nov 12 16:40:54 linux caddy[455]: {"level":"error","ts":1731400854.1529422,"logger":"tls.renew","msg":"will retry","error":"open /tmp/caddy/certificates/new-acme/example.com/example.com.key: no such file or directory","attempt":3,"retrying_in":120,"elapsed":180.002532897,"max_duration":2592000}
The log about certificate expires soon; queuing for renewal and no file or directory appears multiple times later and is omitted.
The cached certificate is from /tmp/caddy/certificates/old-acme/example.com/example.com.key, and I reverted acme url and deleted the config and reload to fix it.
I experience a similar issue with no file or directory. However, my config neither the issuer changes. At least I could not reason why it should.
At least for me this seems to be resolved with Caddy 2.10.
Interesting, that's good, but I don't know why 😅
Nevermind, the issue resurfaced.