traefik
traefik copied to clipboard
acme: don't store outdated certificates
What does this PR do?
Traefik keeps and tries to update outdated certificates forever even if they are no longer in use or/and can't be renewed anymore. This PR removes outdated certificates from the acme store to avoid polluting it.
Motivation
When one provisions white label app and manages domains on his behalf he inevitable removes some domains from time to time. But traefik keeps outdated certificates and moreover tries to update them with no reason.
+1 for this, please can we get this merged into Traefik
Is there an update, regarding this issue ? I think it would help many sys admins, if the certs would be removed automatically :)
This PR is very interesting but from the point of the design, we will try to create something with a more extensive scope (revocation, ...). We will work on that soon, stay tuned.
As workaround you can use traefik-certs-cleaner.
Thanks, for the quick reply! :)
Okay, so what is your recommended 'work around' for this problem, regarding this issue ? something discussed in this issue #3376 or something else?
I have this problem right now in the company I work, and it would be nice, having a solution, that includes the larger scope of this project, you mentioned.
It has been 1 1/2 years. It would be really nice if Traefik would remove outdated certificates or not try to renew them. It's not remotely practical to manually remove unused certificates and restart Traefik every time a domain is no longer in use.
For example: A client configures their own domain in my application. Traefik receives this configuration and obtains a certificate from LetsEncrypt. This client moves on and stops using my service, changes the DNS records or lets the domain expire. This will cause unnecessary errors when Traefik tries to renew the certificate and will mask any real domain problems I might have by writing error logs every day.
Any idea on when this issue going to be resolved?
Hey here, on behalf of the maintainer team, I would like to apologize for this radio silence. This is not great. This PR was put on hold because we wanted to provide a more extensive solution, but at the end of the day, we never found the time to finalize this. If we had to do it again, we would instead move quickly with this PR, as a first step. We will review it in the next few weeks and move forward as fast as possible.
Hello there,
First and foremost, please apologize again for the long time with no action on this PR. As Emile explained, we had planned to improve the TLS certificate management globally, so we haven’t moved forward on this PR.
Such a modification hasn’t been brought yet and it’s now time to review it.
Let’s talk about the PR 🙂
I speak on my behalf, and even if the use case makes sense, I don’t think this PR is the best way to tackle this topic.
My main concern is the condition checked to delete the certificates from the renewal list: this PR proposes to delete the certificates that are outdated for more than 7 days.
IMO, using the expiry date is not the right approach: if you delete a certificate that is used by a router, the situation will be worse: instead of failing during the certificate renewal once a day, Traefik will try to create this certificate during each dynamic configuration reload.
In my PoV, it would make more sense to check if a certificate is used by at least one router before renewing it. Thus Traefik could delete the unnecessary certificates, and serving an outdated certificate would indicate an error during the challenge resolution.
For this reason, I propose the following actions:
- Closing this PR (it is pretty outdated and modifying it would be complicated now)
- Opening a proposal describing another approach based on the router's domain check.
Before moving forward on it, I’d like to have your feedback. I may miss something.
Many thanks for considering my request.
In my PoV, it would make more sense to check if a certificate is used by at least one router before renewing it. Thus Traefik could delete the unnecessary certificates, and serving an outdated certificate would indicate an error during the challenge resolution.
Hello @nmengin
This was also my first intuition to solving the problem. Just don't renew certificates that are no longer used by a router. But I had a discussion with @ldez about this via email. The concern he raised as to why renewing unused certificates is necessary is as follows.
If you have thousands of certificates that are used on and off, the problem is that when you add routers that use these certificates back in, Traefik may need to renew more certificates than the limits of Let's Encrypt allow.
However, if Traefik renews these certificates gradually over a period of time, you can stay under the limits. This means that removing unused certificates would break the functionality of Traefik for some customers and it would take weeks or even months to get the certificates back.
The solution that we both agreed would work for everyone is to introduce a time limit (or failed attempts, which is basically the same thing).
Let's Encrypt certificates are valid for 90 days. After 60 days, Traefik will try to renew them once a day. If the renewal fails, Traefik will try again the next day. If you could configure a limit to the number of days (or attempts) that traefik will try to renew before removing the certificates, we wouldn't break traefik for some customers and outdated certificates would actually be removed.
Quote from ldez:
Example:
- Expiration date: 2024/06/30
- First attempt at renewal: 2024/05/31
- Time limit: 2 days
-> The certificate will be removed the 2024/06/02.
ldez please correct me if I'm wrong, I wrote this partly from memory and didn't read our whole conversation again.
That's an interesting point that people might need certs for domains they aren't using.
The opposite problem that can occur is if you are renewing certs for a former customer who no longer points their DNS to your traefik server then renewing that certificate will fail, so you really want not to try to renew that cert that isn't routed.
The opposite problem that can occur is if you are renewing certs for a former customer who no longer points their DNS to your traefik server then renewing that certificate will fail, so you really want not to try to renew that cert that isn't routed.
Yes this is exactly the scenario I am facing in my company and was the reason why I spoke to ldez. The solution I mentioned earlier would still produce failed renewals but these would stop after n days. I personally would be ok with this.
The only solution I've come up with that does not break Traefik in some cases and still does not spam my logs with "failed" renewals is implementing a different storage option for certificates other than a json file.
If the certificates could be stored in for example HashiCorp Vault I could just remove the certificate from the store programmatically after my customer has deleted their domain from my service.
I know I could also edit the json file programmatically and restart traefik, but I don't want to do that every time a customer decides to delete a domain on my service.
If anyone has any further input on this, I would love to hear your thoughts.
This PR #10782 should resolve the problem.